Pybrain 简明教程
PyBrain - Testing Network
在本章中,我们将看到一些示例,在这些示例中我们将训练数据并测试训练数据上的错误。
In this chapter, we are going to see some example where we are going to train the data and test the errors on the trained data.
我们将使用训练器:
We are going to make use of trainers −
BackpropTrainer
BackpropTrainer 是按照有监督或 ClassificationDataSet 数据集(可能按顺序),通过反向传播错误(随时)来训练模块参数的训练器。
BackpropTrainer is trainer that trains the parameters of a module according to a supervised or ClassificationDataSet dataset (potentially sequential) by backpropagating the errors (through time).
TrainUntilConvergence
它用于训练模块,直到它收敛。
It is used to train the module on the dataset until it converges.
当我们创建一个神经网络时,它将根据给定的训练数据进行训练。现在,网络是否经过了正确的训练,取决于在该网络上测试的测试数据的预测。
When we create a neural network, it will get trained based on the training data given to it.Now whether the network is trained properly or not will depend on prediction of test data tested on that network.
让我们一步一步地看一个工作示例,其中我们将在构建一个神经网络并预测训练错误、测试错误和验证错误。
Let us see a working example step by step which where will build a neural network and predict the training errors, test errors and validation errors.
Testing our Network
以下是我们将遵循的用于测试我们网络的步骤:
Following are the steps we will follow for testing our Network −
-
Importing required PyBrain and other packages
-
Create ClassificationDataSet
-
Splitting the datasets 25% as testdata and 75% as trained data
-
Converting Testdata and Trained data back as ClassificationDataSet
-
Creating a Neural Network
-
Training the Network
-
Visualizing the error and validation data
-
Percentage for test data Error
Step 1
Step 1
导入所需的 PyBrain 和其他软件包。
Importing required PyBrain and other packages.
我们需要的软件包按以下方式导入:
The packages that we need are imported as shown below −
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
Step 2
Step 2
下一步是创建 ClassificationDataSet。
The next step is to create ClassificationDataSet.
对于数据集,我们将使用 sklearn 数据集中的数据集,如下所示:
For Datasets, we are going to use datasets from sklearn datasets as shown below −
在下面的链接中查看 sklearn 中的 load_digits 数据集:
Refer load_digits datasets from sklearn in the below link −
digits = datasets.load_digits()
X, y = digits.data, digits.target
ds = ClassificationDataSet(64, 1, nb_classes=10)
# we are having inputs are 64 dim array and since the digits are from 0-9 the
classes considered is 10.
for i in range(len(X)):
ds.addSample(ravel(X[i]), y[i]) # adding sample to datasets
Step 3
Step 3
将数据集拆分为 25% 作为测试数据和 75% 作为训练数据:
Splitting the datasets 25% as testdata and 75% as trained data −
test_data_temp, training_data_temp = ds.splitWithProportion(0.25)
所以在这里,我们在数据集中使用了一个叫做 splitWithProportion() 的方法,其值为 0.25,它将把数据集拆分为 25% 作为测试数据和 75% 作为训练数据。
So here, we have used a method on dataset called splitWithProportion() with value 0.25, it will split the dataset into 25% as test data and 75% as training data.
Step 4
Step 4
将测试数据和训练数据转换回 ClassificationDataSet。
Converting Testdata and Trained data back as ClassificationDataSet.
test_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, test_data_temp.getLength()):
test_data.addSample( test_data_temp.getSample(n)[0], test_data_temp.getSample(n)[1] )
training_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, training_data_temp.getLength()):
training_data.addSample(
training_data_temp.getSample(n)[0], training_data_temp.getSample(n)[1]
)
test_data._convertToOneOfMany()
training_data._convertToOneOfMany()
在数据集上使用 splitWithProportion() 方法会将数据集转换为 superviseddataset,因此我们将数据集转换回 classificationdataset,如上一步所示。
Using splitWithProportion() method on dataset converts the dataset to superviseddataset, so we will convert the dataset back to classificationdataset as shown in above step.
Step 5
Step 5
下一步是创建神经网络。
Next step is creating a Neural Network.
net = buildNetwork(training_data.indim, 64, training_data.outdim, outclass=SoftmaxLayer)
我们在其中创建了一个网络,该网络中的输入和输出是使用训练数据。
We are creating a network wherein the input and output are used from the training data.
Step 6
Step 6
Training the Network
现在,重要部分是对数据集上的网络进行训练,如下所示:
Now the important part is training the network on the dataset as shown below −
trainer = BackpropTrainer(net, dataset=training_data,
momentum=0.1,learningrate=0.01,verbose=True,weightdecay=0.01)
我们正在使用 BackpropTrainer() 方法并在创建的网络上使用数据集。
We are using BackpropTrainer() method and using dataset on the network created.
Step 7
Step 7
下一步是可视化数据的错误和验证。
The next step is visualizing the error and validation of the data.
trnerr,valerr = trainer.trainUntilConvergence(dataset=training_data,maxEpochs=10)
plt.plot(trnerr,'b',valerr,'r')
plt.show()
我们将在训练数据上使用一种名为 trainUntilConvergence 的方法,它将在 10 个 epoch 中收敛。它将返回训练误差和验证误差,我们已将它们绘制在下图中。蓝线显示训练误差,红线显示验证误差。
We will use a method called trainUntilConvergence on training data that will converge for epochs of 10. It will return training error and validation error which we have plotted as shown below. The blue line shows the training errors and red line shows the validation error.

在执行上述代码期间收到的总误差如下所示 −
Total error received during execution of the above code is shown below −
Total error: 0.0432857814358
Total error: 0.0222276374185
Total error: 0.0149012052174
Total error: 0.011876985318
Total error: 0.00939854792853
Total error: 0.00782202445183
Total error: 0.00714707652044
Total error: 0.00606068893793
Total error: 0.00544257958975
Total error: 0.00463929281336
Total error: 0.00441275665294
('train-errors:', '[0.043286 , 0.022228 , 0.014901 , 0.011877 , 0.009399 , 0.007
822 , 0.007147 , 0.006061 , 0.005443 , 0.004639 , 0.004413 ]')
('valid-errors:', '[0.074296 , 0.027332 , 0.016461 , 0.014298 , 0.012129 , 0.009
248 , 0.008922 , 0.007917 , 0.006547 , 0.005883 , 0.006572 , 0.005811 ]')
该误差从 0.04 开始,然后随着每个 epoch 的进行而减小,这意味着网络正在接受训练,并且每个 epoch 都会变得更好。
The error starts at 0.04 and later goes down for each epoch, which means the network is getting trained and gets better for each epoch.
Step 8
Step 8
Percentage for test data error
我们可以使用 percentError 方法检查误差百分比,如下所示 −
We can check the percent error using percentError method as shown below −
print('Percent Error on
testData:',percentError(trainer.testOnClassData(dataset=test_data),
test_data['class']))
Percent Error on testData − 3.34075723830735
Percent Error on testData − 3.34075723830735
我们正在获得误差百分比,即 3.34%,这意味着神经网络具有 97% 的准确性。
We are getting the error percent, i.e., 3.34%, which means the neural network is 97% accurate.
以下是完整代码 −
Below is the full code −
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
digits = datasets.load_digits()
X, y = digits.data, digits.target
ds = ClassificationDataSet(64, 1, nb_classes=10)
for i in range(len(X)):
ds.addSample(ravel(X[i]), y[i])
test_data_temp, training_data_temp = ds.splitWithProportion(0.25)
test_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, test_data_temp.getLength()):
test_data.addSample( test_data_temp.getSample(n)[0], test_data_temp.getSample(n)[1] )
training_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, training_data_temp.getLength()):
training_data.addSample(
training_data_temp.getSample(n)[0], training_data_temp.getSample(n)[1]
)
test_data._convertToOneOfMany()
training_data._convertToOneOfMany()
net = buildNetwork(training_data.indim, 64, training_data.outdim, outclass=SoftmaxLayer)
trainer = BackpropTrainer(
net, dataset=training_data, momentum=0.1,
learningrate=0.01,verbose=True,weightdecay=0.01
)
trnerr,valerr = trainer.trainUntilConvergence(dataset=training_data,maxEpochs=10)
plt.plot(trnerr,'b',valerr,'r')
plt.show()
trainer.trainEpochs(10)
print('Percent Error on testData:',percentError(
trainer.testOnClassData(dataset=test_data), test_data['class']
))