Spacy 简明教程
spaCy - Updating Neural Network Model
在本章中,我们将了解如何在 spaCy 中更新神经网络模型。
In this chapter, we will learn how to update the neural network model in spaCy.
Reasons to update
以下是更新现有模型的原因 −
Following are the reasons to update an existing model −
-
The updated model will provide better results on your specific domain.
-
While updating an existing model, you can learn classification schemes for your problem.
-
Updating an existing model is essential for text classification.
-
It is especially useful for named entity recognition.
-
It is less critical for POS tagging as well as dependency parsing.
Updating an existing model
借助 spaCy,我们可以使用更多数据更新现有的预训练模型。例如,我们可以更新模型以改进其在不同文本上的预测。
With the help of spaCy, we can update an existing pre-trained model with more data. For example, we can update the model to improve its predictions on different texts.
更新现有预训练模型非常有用,如果你想对其已知的类别进行改进。例如,“人”或“组织”。我们还可以更新现有预训练模型以添加新类别。
Updating an existing pre-trained model is very useful, if you want to improve the categories which the model already knows. For example, "person" or "organization". We can also update an existing pre-trained model for adding new categories.
我们建议始终用新类别的示例以及该模型之前正确预测的其他类别的示例更新现有预训练模型。如果不这样做,那么改进新类别可能会损害其他类别。
It is recommended to always update an existing pre-trained model with examples of the new category as well as examples of the other categories, which the model previously predicted correctly. If not done, improving the new category might hurt the other categories.
Setting up a new pipeline
从以下给定的示例中,让我们了解如何从头开始设置一个新管道以更新现有模型-
From the below given example, let us understand how we can set up a new pipeline from scratch for updating an existing model −
-
First, we will start with blank English model by using spacy.blank method. It only has the language data and tokenization rules and does not have any pipeline component.
-
After that we will create a blank entity recognizer and will add it to the pipeline. Next, we will add the new string labels to the model by using add_label.
-
Now, we can initialize the model with random weights by calling nlp.begin_training.
-
Next, we need to randomly shuffle the data on each iteration. It is to get better accuracy.
-
Once shuffled, divide the example into batches by using spaCy’s minibatch function. At last, update the model with texts and annotations and then, continue to loop.
Examples
Examples
下面是 starting with blank English model by using spacy.blank 的一个示例-
Given below is an example for starting with blank English model by using spacy.blank−
nlp = spacy.blank("en")
以下是 creating blank entity recognizer and adding it to the pipeline 的一个示例-
Following is an example for creating blank entity recognizer and adding it to the pipeline −
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
这是 adding a new label by using add_label 的一个示例-
Here is an example for adding a new label by using add_label −
ner.add_label("GADGET")
starting the training by using nlp.begin_training is as follows 的一个示例-
An example for starting the training by using nlp.begin_training is as follows −
nlp.begin_training()
这是一个 training for iterations and shuffling the data on each iteration 的示例。
This is an example for training for iterations and shuffling the data on each iteration.
for itn in range(10):
random.shuffle(examples)
这是一个 dividing the examples into batches using minibatch utility function for batch in spacy.util.minibatch(examples, size=2) 的示例。
This is an example for dividing the examples into batches using minibatch utility function for batch in spacy.util.minibatch(examples, size=2).
texts = [text for text, annotation in batch]
annotations = [annotation for text, annotation in batch]
下面是 updating the model with texts and annotations 的一个示例-
Given below is an example for updating the model with texts and annotations −
nlp.update(texts, annotations)