Python Text Processing 简明教程
Python - Chunks and Chinks
分块是根据单词的性质将类似的单词分组在一起的过程。在下面的示例中,我们定义了一个分块必须在其中生成的语法。该语法建议名词和形容词等短语的顺序,在创建分块时将遵循该顺序。分块的图片输出如下所示。
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
result.draw()
当我们运行以上程序时,我们得到了以下输出 −
更改语法后,我们会得到不同的输出,如下所示。
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
当我们运行以上程序时,我们得到了以下输出 −
Chinking
分块是从分块中删除一系列标记的过程。如果标记序列出现在分块的中间,则删除这些标记,留下两个分块,它们本来就在那里。
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = r"""
NP:
{<.*>+} # Chunk everything
}+{ # Chink sequences of JJ and NN
"""
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()