Python Text Processing 简明教程
Python - Stemming Algorithms
在自然语言处理领域,我们遇到单词具有共同词根的情况。例如,三个单词“agreed”、“agreeing”和“agreeable”具有相同的单词词根“agree”。任何包含这些单词的搜索都应将这些单词视为其词根的相同单词。因此,将所有单词链接到其词根至关重要。NLTK 库具有执行此链接并提供显示词根的输出的方法。
nltk 中提供了三种最常用的词干提取算法。它们给出的结果略有不同。下面的示例显示了所有三种词干提取算法及其结果的用法。
import nltk
from nltk.stem.porter import PorterStemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem import SnowballStemmer
porter_stemmer = PorterStemmer()
lanca_stemmer = LancasterStemmer()
sb_stemmer = SnowballStemmer("english",)
word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns"
# First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
print '***PorterStemmer****\n'
for w_port in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_port,porter_stemmer.stem(w_port))
print '\n***LancasterStemmer****\n'
for w_lanca in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_lanca,lanca_stemmer.stem(w_lanca))
print '\n***SnowballStemmer****\n'
for w_snow in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_snow,sb_stemmer.stem(w_snow))
当我们运行以上程序时,我们得到了以下输出 −
***PorterStemmer****
Actual: Aging || Stem: age
Actual: head || Stem: head
Actual: of || Stem: of
Actual: famous || Stem: famou
Actual: crime || Stem: crime
Actual: family || Stem: famili
Actual: decides || Stem: decid
Actual: to || Stem: to
Actual: transfer || Stem: transfer
Actual: his || Stem: hi
Actual: position || Stem: posit
Actual: to || Stem: to
Actual: one || Stem: one
Actual: of || Stem: of
Actual: his || Stem: hi
Actual: subalterns || Stem: subaltern
***LancasterStemmer****
Actual: Aging || Stem: ag
Actual: head || Stem: head
Actual: of || Stem: of
Actual: famous || Stem: fam
Actual: crime || Stem: crim
Actual: family || Stem: famy
Actual: decides || Stem: decid
Actual: to || Stem: to
Actual: transfer || Stem: transf
Actual: his || Stem: his
Actual: position || Stem: posit
Actual: to || Stem: to
Actual: one || Stem: on
Actual: of || Stem: of
Actual: his || Stem: his
Actual: subalterns || Stem: subaltern
***SnowballStemmer****
Actual: Aging || Stem: age
Actual: head || Stem: head
Actual: of || Stem: of
Actual: famous || Stem: famous
Actual: crime || Stem: crime
Actual: family || Stem: famili
Actual: decides || Stem: decid
Actual: to || Stem: to
Actual: transfer || Stem: transfer
Actual: his || Stem: his
Actual: position || Stem: posit
Actual: to || Stem: to
Actual: one || Stem: one
Actual: of || Stem: of
Actual: his || Stem: his
Actual: subalterns || Stem: subaltern