Python Text Processing 简明教程

Python - Spelling Check

拼写检查是文本处理或分析中的基本要求。python 包 pyspellchecker 为我们提供了此功能,以查找可能拼写错误的单词,并建议可能的更正。

Checking of spelling is a basic requirement in any text processing or analysis. The python package pyspellchecker provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections.

首先,我们需要使用以下命令在 python 环境中安装所需的包。

First, we need to install the required package using the following command in our python environment.

 pip install pyspellchecker

现在我们看到下面包如何用于指出现在拼写错误的单词以及提出一些有关可能正确单词的建议。

Now we see below how the package is used to point out the wrongly spelled words as well as make some suggestions about possible correct words.

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['let', 'us', 'wlak','on','the','groun'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

当我们运行以上程序时,我们得到了以下输出 −

When we run the above program we get the following output −

group
{'group', 'ground', 'groan', 'grout', 'grown', 'groin'}
walk
{'flak', 'weak', 'walk'}

Case Sensitive

如果我们使用 Let 而不是 let,那么这意味着单词与词典中最接近的匹配单词进行区分大小写的比较,而结果现在看起来不同。

If we use Let in place of let then this becomes a case sensitive comparison of the word with the closest matched words in dictionary and the result looks different now.

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['Let', 'us', 'wlak','on','the','groun'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

当我们运行以上程序时,我们得到了以下输出 −

When we run the above program we get the following output −

group
{'groin', 'ground', 'groan', 'group', 'grown', 'grout'}
walk
{'walk', 'flak', 'weak'}
get
{'aet', 'ret', 'get', 'cet', 'bet', 'vet', 'pet', 'wet', 'let', 'yet', 'det', 'het', 'set', 'et', 'jet', 'tet', 'met', 'fet', 'net'}