Natural Language Toolkit 简明教程
Synonym & Antonym Replacement
Replacing words with common synonyms
在使用NLP时,尤其是在频率分析和文本索引的情况下,在不丢失含义的情况下压缩词汇始终是有益的,因为它可以节省大量内存。为了实现此目标,我们必须定义单词与其同义词之间的映射。在以下示例中,我们将创建一个名为 word_syn_replacer 的类,该类可用于将单词替换为其常用同义词。
While working with NLP, especially in the case of frequency analysis and text indexing, it is always beneficial to compress the vocabulary without losing meaning because it saves lots of memory. To achieve this, we must have to define mapping of a word to its synonyms. In the example below, we will be creating a class named word_syn_replacer which can be used for replacing the words with their common synonyms.
Example
首先,导入必要的包 re 以使用正则表达式。
First, import the necessary package re to work with regular expressions.
import re
from nltk.corpus import wordnet
接下来,创建一个接受单词替换映射的类 −
Next, create the class that takes a word replacement mapping −
class word_syn_replacer(object):
def __init__(self, word_map):
self.word_map = word_map
def replace(self, word):
return self.word_map.get(word, word)
保存该Python程序(例如replacesyn.py),并从Python命令提示符处运行它。在运行它之后,导入 word_syn_replacer 类,以便用常用同义词替换单词。让我们看看如何操作。
Save this python program (say replacesyn.py) and run it from python command prompt. After running it, import word_syn_replacer class when you want to replace words with common synonyms. Let us see how.
from replacesyn import word_syn_replacer
rep_syn = word_syn_replacer ({‘bday’: ‘birthday’)
rep_syn.replace(‘bday’)
Complete implementation example
import re
from nltk.corpus import wordnet
class word_syn_replacer(object):
def __init__(self, word_map):
self.word_map = word_map
def replace(self, word):
return self.word_map.get(word, word)
现在,一旦您保存了上述程序并运行它,您就可以导入该类并按如下方式使用它 −
Now once you saved the above program and run it, you can import the class and use it as follows −
from replacesyn import word_syn_replacer
rep_syn = word_syn_replacer ({‘bday’: ‘birthday’)
rep_syn.replace(‘bday’)
Output
'birthday'
上述方法的缺点是我们必须在Python词典中硬编码同义词。我们有两个更好的选择,即CSV和YAML文件。我们可以将同义词表保存在上述任何文件中,并可以从中构建 word_map 词典。让我们借助示例了解该概念。
The disadvantage of the above method is that we should have to hardcode the synonyms in a Python dictionary. We have two better alternatives in the form of CSV and YAML file. We can save our synonym vocabulary in any of the above-mentioned files and can construct word_map dictionary from them. Let us understand the concept with the help of examples.
Using CSV file
为了将CSV文件用于此目的,该文件应有两列,第一列包含单词,第二列包含用于替换单词的同义词。让我们将此文件保存为 syn.csv. 在下面的示例中,我们将创建一个名为 CSVword_syn_replacer 的类,该类将扩展 replacesyn.py 文件中 word_syn_replacer 中的内容,并将用于从 syn.csv 文件中构建 word_map 词典。
In order to use CSV file for this purpose, the file should have two columns, first column consist of word and the second column consists of the synonyms meant to replace it. Let us save this file as syn.csv. In the example below, we will be creating a class named CSVword_syn_replacer which will extends word_syn_replacer in replacesyn.py file and will be used to construct the word_map dictionary from syn.csv file.
Example
首先,导入必需的包。
First, import the necessary packages.
import csv
接下来,创建一个接受单词替换映射的类 −
Next, create the class that takes a word replacement mapping −
class CSVword_syn_replacer(word_syn_replacer):
def __init__(self, fname):
word_map = {}
for line in csv.reader(open(fname)):
word, syn = line
word_map[word] = syn
super(Csvword_syn_replacer, self).__init__(word_map)
在运行它之后,导入 CSVword_syn_replacer 类,以便用常用同义词替换单词。让我们看看如何操作?
After running it, import CSVword_syn_replacer class when you want to replace words with common synonyms. Let us see how?
from replacesyn import CSVword_syn_replacer
rep_syn = CSVword_syn_replacer (‘syn.csv’)
rep_syn.replace(‘bday’)
Complete implementation example
import csv
class CSVword_syn_replacer(word_syn_replacer):
def __init__(self, fname):
word_map = {}
for line in csv.reader(open(fname)):
word, syn = line
word_map[word] = syn
super(Csvword_syn_replacer, self).__init__(word_map)
现在,一旦您保存了上述程序并运行它,您就可以导入该类并按如下方式使用它 −
Now once you saved the above program and run it, you can import the class and use it as follows −
from replacesyn import CSVword_syn_replacer
rep_syn = CSVword_syn_replacer (‘syn.csv’)
rep_syn.replace(‘bday’)
Using YAML file
由于我们使用了CSV文件,因此还可以将YAML文件用于此目的(我们必须安装了PyYAML)。让我们将此文件保存为 syn.yaml. 在下面的示例中,我们将创建一个名为 YAMLword_syn_replacer 的类,该类将扩展 replacesyn.py 文件中 word_syn_replacer 中的内容,并将用于从 syn.yaml 文件中构建 word_map 词典。
As we have used CSV file, we can also use YAML file to for this purpose (we must have PyYAML installed). Let us save the file as syn.yaml. In the example below, we will be creating a class named YAMLword_syn_replacer which will extends word_syn_replacer in replacesyn.py file and will be used to construct the word_map dictionary from syn.yaml file.
Example
首先,导入必需的包。
First, import the necessary packages.
import yaml
接下来,创建一个接受单词替换映射的类 −
Next, create the class that takes a word replacement mapping −
class YAMLword_syn_replacer(word_syn_replacer):
def __init__(self, fname):
word_map = yaml.load(open(fname))
super(YamlWordReplacer, self).__init__(word_map)
在运行它之后,导入 YAMLword_syn_replacer 类,以便用常用同义词替换单词。让我们看看如何操作?
After running it, import YAMLword_syn_replacer class when you want to replace words with common synonyms. Let us see how?
from replacesyn import YAMLword_syn_replacer
rep_syn = YAMLword_syn_replacer (‘syn.yaml’)
rep_syn.replace(‘bday’)
Complete implementation example
import yaml
class YAMLword_syn_replacer(word_syn_replacer):
def __init__(self, fname):
word_map = yaml.load(open(fname))
super(YamlWordReplacer, self).__init__(word_map)
现在,一旦您保存了上述程序并运行它,您就可以导入该类并按如下方式使用它 −
Now once you saved the above program and run it, you can import the class and use it as follows −
from replacesyn import YAMLword_syn_replacer
rep_syn = YAMLword_syn_replacer (‘syn.yaml’)
rep_syn.replace(‘bday’)
Antonym replacement
众所周知,反义词是一个与另一个单词含义相反的单词,而反义词替换的反义词是同义词替换。在本节中,我们将处理反义词替换,即使用 WordNet 用明确的反义词替换单词。在下面的示例中,我们将创建一个名为 word_antonym_replacer 的类,它有两种方法,一种用于替换单词,另一种用于去除否定。
As we know that an antonym is a word having opposite meaning of another word, and the opposite of synonym replacement is called antonym replacement. In this section, we will be dealing with antonym replacement, i.e., replacing words with unambiguous antonyms by using WordNet. In the example below, we will be creating a class named word_antonym_replacer which have two methods, one for replacing the word and other for removing the negations.
Example
首先,导入必需的包。
First, import the necessary packages.
from nltk.corpus import wordnet
接下来,创建名为 word_antonym_replacer 的类 -
Next, create the class named word_antonym_replacer −
class word_antonym_replacer(object):
def replace(self, word, pos=None):
antonyms = set()
for syn in wordnet.synsets(word, pos=pos):
for lemma in syn.lemmas():
for antonym in lemma.antonyms():
antonyms.add(antonym.name())
if len(antonyms) == 1:
return antonyms.pop()
else:
return None
def replace_negations(self, sent):
i, l = 0, len(sent)
words = []
while i < l:
word = sent[i]
if word == 'not' and i+1 < l:
ant = self.replace(sent[i+1])
if ant:
words.append(ant)
i += 2
continue
words.append(word)
i += 1
return words
保存这个 Python 程序(例如 replaceantonym.py)并从 Python 命令提示符运行它。运行它后,当您想用明确的反义词替换单词时,导入 word_antonym_replacer 类。让我们来看看怎么做。
Save this python program (say replaceantonym.py) and run it from python command prompt. After running it, import word_antonym_replacer class when you want to replace words with their unambiguous antonyms. Let us see how.
from replacerantonym import word_antonym_replacer
rep_antonym = word_antonym_replacer ()
rep_antonym.replace(‘uglify’)
Output
['beautify'']
sentence = ["Let us", 'not', 'uglify', 'our', 'country']
rep_antonym.replace _negations(sentence)
Complete implementation example
nltk.corpus import wordnet
class word_antonym_replacer(object):
def replace(self, word, pos=None):
antonyms = set()
for syn in wordnet.synsets(word, pos=pos):
for lemma in syn.lemmas():
for antonym in lemma.antonyms():
antonyms.add(antonym.name())
if len(antonyms) == 1:
return antonyms.pop()
else:
return None
def replace_negations(self, sent):
i, l = 0, len(sent)
words = []
while i < l:
word = sent[i]
if word == 'not' and i+1 < l:
ant = self.replace(sent[i+1])
if ant:
words.append(ant)
i += 2
continue
words.append(word)
i += 1
return words
现在,一旦您保存了上述程序并运行它,您就可以导入该类并按如下方式使用它 −
Now once you saved the above program and run it, you can import the class and use it as follows −
from replacerantonym import word_antonym_replacer
rep_antonym = word_antonym_replacer ()
rep_antonym.replace(‘uglify’)
sentence = ["Let us", 'not', 'uglify', 'our', 'country']
rep_antonym.replace _negations(sentence)