Pybrain 简明教程
PyBrain - Overview
Pybrain 是一个使用 Python 实现的机器学习开源库。这个库为您提供了一些易于使用的训练算法,用于训练和测试网络的数据集和训练员。
Pybrain is an open-source library for Machine learning implemented using python. The library offers you some easy to use training algorithms for networks, datasets, trainers to train and test the network.
Pybrain 官方文档对其的定义如下 −
Definition of Pybrain as put by its official documentation is as follows −
PyBrain 是一个模块化的 Python 机器学习库。它的目标是为机器学习任务提供强大、易于使用且灵活的算法,以及各种预定义的环境来测试和比较您的算法。
PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.
PyBrain 是基于 Python 的强化学习、人工智能和神经网络库的简写。事实上,我们首先想出了这个名字,然后对这个非常具有描述性的“反义词”进行了逆向工程。
PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence, and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive "Backronym".
Features of Pybrain
以下是 Pybrain 的功能 −
The following are the features of Pybrain −
Networks
一个网络由模块组成,并且它们使用连接进行连接。Pybrain 支持神经网络,如前馈网络、循环网络等。
A network is composed of modules and they are connected using connections. Pybrain supports neural networks like Feed-Forward Network, Recurrent Network, etc.
feed-forward network 是神经网络,其中节点之间的信息向前移动,并且永远不会向后传播。前馈网络是人工神经网络中可用网络中第一个也是最简单的网络。
feed-forward network is a neural network, where the information between nodes moves in the forward direction and will never travel backward. Feed Forward network is the first and the simplest one among the networks available in the artificial neural network.
信息从输入节点传递到下一个隐藏节点,之后传递到输出节点。
The information is passed from the input nodes, next to the hidden nodes and later to the output node.
Recurrent Networks 类似于前馈神经网络;唯一的区别是它必须在每个步骤中记住数据。必须保存每个步骤的历史记录。
Recurrent Networks are similar to Feed Forward Network; the only difference is that it has to remember the data at each step. The history of each step has to be saved.
Datasets
数据集是提供给测试、验证和训练网络的数据。要使用的数据集类型取决于我们要使用机器学习完成的任务。Pybrain 支持的最常使用的数据集是 SupervisedDataSet 和 ClassificationDataSet 。
Datasets is the data to be given to test, validate and train on networks. The type of dataset to be used depends on the tasks that we are going to do with Machine Learning. The most commonly used datasets that Pybrain supports are SupervisedDataSet and ClassificationDataSet.
SupervisedDataSet − 它由 input 和 target 组成。它是最简单的 dataset 形式,主要用于监督学习任务。
SupervisedDataSet − It consists of fields of input and target. It is the simplest form of a dataset and mainly used for supervised learning tasks.
ClassificationDataSet − 它主要用于解决分类问题。它接受 input 、 target 域,还接受一个名为“class”的额外域,它是所给目标的自动备份。例如,输出要么是 1 或 0,要么是基于给定输入的值将输出分组,也就是说,它会属于一个特定类。
ClassificationDataSet − It is mainly used to deal with classification problems. It takes in input, target field and also an extra field called "class" which is an automated backup of the targets given. For example, the output will be either 1 or 0 or the output will be grouped together with values based on input given, i.e., either it will fall in one particular class.
Trainer
当我们创建一个网络,即神经网络时,它会根据提供给它的训练数据进行训练。现在,网络是否已正确训练将取决于对该网络上测试的测试数据的预测。Pybrain Training 中最重要的概念是 BackpropTrainer 和 TrainUntilConvergence 的使用。
When we create a network, i.e., neural network, it will get trained based on the training data given to it. Now whether the network is trained properly or not will depend on the prediction of test data tested on that network. The most important concept in Pybrain Training is the use of BackpropTrainer and TrainUntilConvergence.
BackpropTrainer − 它是一个训练器,它通过反向传播误差(通过时间)根据监督或 ClassificationDataSet dataset(潜在序列)训练模块的参数。
BackpropTrainer − It is a trainer that trains the parameters of a module according to a supervised or ClassificationDataSet dataset (potentially sequential) by backpropagating the errors (through time).
TrainUntilConvergence − 用于训练数据模块,直至其收敛。
TrainUntilConvergence −It is used to train the module on the dataset until it converges.
Advantages of Pybrain
Pybrain 的优点包括:
The advantages of Pybrain are −
-
Pybrain is an open-source free library to learn Machine Learning. It is a good start for any newcomer interested in Machine Learning.
-
Pybrain uses python to implement it and that makes it fast in development in comparison to languages like Java/C++.
-
Pybrain works easily with other libraries of python to visualize data.
-
Pybrain offers support for popular networks like Feed-Forward Network, Recurrent Networks, Neural Networks, etc.
-
Working with .csv to load datasets is very easy in Pybrain. It also allows using datasets from another library.
-
Training and testing of data are easy using Pybrain trainers.
Limitations of Pybrain
Pybrain 对遇到的任何问题提供的帮助更少。在 stackoverflow 和 Google Group 上有一些问题没有得到解答。
Pybrain offers less help for any issues faced. There are some queries unanswered on stackoverflow and on Google Group.
Workflow of Pybrain
根据 Pybrain 文档,机器学习的流程如下图所示:
As per Pybrain documentation the flow of machine learning is shown in the following figure −
在开始时,我们有原始数据,经过预处理后,可以使用 Pybrain。
At the start, we have raw data which after preprocessing can be used with Pybrain.
Pybrain 的流程从 dataset 开始,dataset 分为训练数据和测试数据。
The flow of Pybrain starts with datasets which are divided into trained and test data.
-
the network is created, and the dataset and the network are given to the trainer.
-
the trainer trains the data on the network and classifies the outputs as trained error and validation error which can be visualized.
-
the tested data can be validated to see if the output matches the trained data.
Terminology
在使用 Pybrain 进行机器学习时,有一些重要的术语需要注意。它们如下所示 −
There are important terms to be considered while working with Pybrain for machine learning. They are as follows −
Total Error − 它指的是网络训练后显示的误差。如果误差在每次迭代时都在不断变化,则表示它仍需要时间来平稳,直到它开始在迭代之间显示恒定的误差。一旦它开始显示恒定的误差数字,则表示网络已收敛,无论是否应用任何其他训练,它都将保持不变。
Total Error − It refers to the error shown after the network is trained. If the error keeps changing on every iteration, it means it still needs time to settle, until it starts showing a constant error between iteration. Once it starts showing the constant error numbers, it means that the network has converged and will remain the same irrespective of any additional training is applied.
Trained data − 它是用于训练 Pybrain 网络的数据。
Trained data − It is the data used to train the Pybrain network.
Testing data − 它是用于测试经过训练的 Pybrain 网络的数据。
Testing data − It is the data used to test the trained Pybrain network.
Trainer − 当我们创建一个网络,即神经网络时,它将根据给定的训练数据进行训练。现在,网络是否正确训练取决于在该网络上测试的测试数据预测。Pybrain 训练中最重要的概念是使用 BackpropTrainer 和 TrainUntilConvergence。
Trainer − When we create a network, i.e., neural network, it will get trained based on the training data given to it. Now whether the network is trained properly or not will depend on the prediction of test data tested on that network. The most important concept in Pybrain Training is the use of BackpropTrainer and TrainUntilConvergence.
BackpropTrainer − 它是一个训练器,它通过反向传播误差(通过时间)根据监督或 ClassificationDataSet dataset(潜在序列)训练模块的参数。
BackpropTrainer − It is a trainer that trains the parameters of a module according to a supervised or ClassificationDataSet dataset (potentially sequential) by backpropagating the errors (through time).
TrainUntilConvergence − 它用于训练模块上的数据集,直到它收敛。
TrainUntilConvergence − It is used to train the module on the dataset until it converges.
Layers − 层基本上是用于网络隐含层的函数集。
Layers − Layers are basically a set of functions that are used on hidden layers of a network.
Connections − 连接的工作原理类似于层;唯一的区别是它将数据从网络中的一个节点转移到另一个节点。
Connections − A connection works similar to a layer; an only difference is that it shifts the data from one node to the other in a network.
Modules − 模块是包含输入和输出缓冲区的网络。
Modules − Modules are networks which consists of input and output buffer.
Supervised Learning − 在这种情况下,我们有输入和输出,并且我们可以利用算法将输入与输出进行映射。该算法旨在学习给定的训练数据并在其上进行迭代,当算法预测正确数据时,迭代过程停止。
Supervised Learning − In this case, we have an input and output, and we can make use of an algorithm to map the input with the output. The algorithm is made to learn on the training data given and iterated on it and the process of iteration stops when the algorithm predicts the correct data.
Unsupervised − 在这种情况下,我们有输入,但不知道输出。无监督学习的作用是根据给定的数据尽可能多地进行训练。
Unsupervised − In this case, we have input but don’t know the output. The role of unsupervised learning is to get trained as much as possible with the data given.
PyBrain - Environment Setup
在本章中,我们将进行 PyBrain 的安装。要开始使用 PyBrain,我们需要首先安装 Python。因此,我们将开展以下工作 −
In this chapter, we will work on the installation of PyBrain. To start working with PyBrain, we need to install Python first. So we are going to work on following −
-
Install Python
-
Install PyBrain
Installing Python
要安装 Python,请访问 Python 官方网站: www.python.org/downloads ,如下所示,然后单击适用于 Windows、Linux/Unix 和 macOS 的最新版本。根据您可用的 64 位或 32 位操作系统下载 Python。
To install Python, go to the Python official site: www.python.org/downloads as shown below and click on the latest version available for windows, Linux/Unix and macOS. Download Python as per your 64- or 32-bit OS available with you.
下载后,单击 .exe 文件并按照步骤在您的系统上安装 python。
Once you have downloaded, click on the .exe file and follow the steps to install python on your system.
python 软件包管理器,即 pip,也将默认情况下通过上述安装进行安装。要使其在您的系统上全局运行,请将 python 的位置直接添加到 PATH 变量中,在安装开始时显示了相同的变量,以便记住选中复选框,该复选框表示添加到 PATH。如果您忘记选中该复选框,请按照下面给定的步骤将其添加到 PATH。
The python package manager, i.e., pip will also get installed by default with the above installation. To make it work globally on your system, directly add the location of python to the PATH variable, the same is shown at the start of the installation to remember to check the checkbox which says ADD to PATH. In case you forget to check it please follow the below given steps to add to PATH.
Add to PATH
要添加到 PATH,请按照以下步骤操作 −
To add to PATH, follow the below steps −
-
Right-click on your Computer icon and click on properties → Advanced System Settings.
-
It will display the screen as shown below
-
Click on Environment Variables as shown above. It will display the screen as shown below
选择“路径”并单击“编辑”按钮,在末尾添加 Python 的位置路径。现在,让我们检查 Python 版本。
Select Path and click on Edit button, add the location path of your python at the end. Now let us check the python version.
Installing PyBrain
现在我们已经安装了 Python,我们准备安装 Pybrain。按如下所示克隆 Pybrain 仓库:
Now that we have installed Python, we are going to install Pybrain. Clone the pybrain repository as shown below −
git clone git://github.com/pybrain/pybrain.git
C:\pybrain>git clone git://github.com/pybrain/pybrain.git
Cloning into 'pybrain'...
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 12177 (delta 0), reused 0 (delta 0), pack-reused 12175
Receiving objects: 100% (12177/12177), 13.29 MiB | 510.00 KiB/s, done.
Resolving deltas: 100% (8506/8506), done.
现在,执行 cd pybrain 并运行以下命令:
Now perform cd pybrain and run following command −
python setup.py install
此命令会在你的系统中安装 Pybrain。
This command will install pybrain on your system.
完成以后,为了检查 Pybrain 是否已安装,请打开命令行提示符并启动 Python 解释器,如下所示:
Once done, to check if pybrain is installed or not, open command line prompt and start the python interpreter as shown below −
C:\pybrain\pybrain>python
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
我们可以使用如下代码添加 import pybrain:
We can add import pybrain using the below code −
>>> import pybrain
>>>
如果 import pybrain 可以正常运行,则表明 Pybrain 已成功安装。你现在可以编写代码以开始使用 Pybrain。
If the import pybrain works without any errors, it means pybrain in installed successfully. You can now write your code to start working with pybrain.
PyBrain - Introduction to PyBrain Networks
PyBrain 是为使用 Python 进行机器学习而开发的库。有许多机器学习的重要概念,其中一个是网络。网络由模块组成,它们通过连接连接。
PyBrain is a library developed for Machine Learning with Python. There are some important concepts in Machine Learning and one among them is Networks. A network is composed of modules and they are connected using connections.
简单神经网络的布局如下:
A layout of a simple neural network is as follows −
Pybrain 支持神经网络,如前馈网络、循环网络等。
Pybrain supports neural networks such as Feed-Forward Network, Recurrent Network, etc.
feed-forward network 是一个神经网络,其中节点之间的信息在正向移动,决不会往后退。在人工神经网络中,前馈网络是第一个也是最简单的一个。信息从输入节点传递到隐藏节点,然后传递到输出节点。
A feed-forward network is a neural network, where the information between nodes moves in the forward direction and will never travel backward. Feed Forward network is the first and the simplest one among the networks available in the artificial neural network. The information is passed from the input nodes, next to the hidden nodes and later to the output node.
以下是简单前馈网络布局。
Here is a simple feed forward network layout.
圆圈表示为模块而带箭头的线表示与模块的连接。
The circles are said to be modules and the lines with arrows are connections to the modules.
节点 A 、 B 、 C 和 D 是输入节点
The nodes A, B, C and D are input nodes
H1 、 H2 、 H3 、 H4 是隐藏节点,O 是输出。
H1, H2, H3, H4 are hidden nodes and O is the output.
在上述网络中,我们有 4 个输入节点、4 个隐含层和 1 个输出。该图表中显示的线数指示模型中在训练期间将调整的权重参数。
In the above network, we have 4 input nodes, 4 hidden layers and 1 output. The number of lines shown in the diagram indicate the weight parameters in the model that are adjusted during training.
Recurrent Networks 与前馈网络类似,唯一的不同之处在于它必须记住每一步的数据。必须保存每一步的历史记录。
Recurrent Networks are similar to Feed Forward Network with the only difference that it has to remember the data at each step. The history of each step has to be saved.
以下是循环网络的简单布局 −
Here is a simple Layout of Recurrent Network −
PyBrain - Working With Networks
一个网络由模块组成,它们使用连接进行连接。在本章中,我们将学习 −
A network is composed of modules, and they are connected using connections. In this chapter, we will learn to −
-
Create Network
-
Analyze Network
Creating Network
我们将使用 Python 解释器来执行代码。要在 PyBrain 中创建网络,我们必须使用 buildNetwork api,如下所示 −
We are going to use python interpreter to execute our code. To create a network in pybrain, we have to use buildNetwork api as shown below −
C:\pybrain\pybrain>python
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>> from pybrain.tools.shortcuts import buildNetwork
>>> network = buildNetwork(2, 3, 1)
>>>
我们使用 buildNetwork() 创建了一个网络,参数是 2、3、1,这意味着该网络由 2 个输入、3 个隐藏层和一个单一输出组成。
We have created a network using buildNetwork() and the params are 2, 3, 1 which means the network is made up of 2 inputs, 3 hidden and one single output.
以下是网络的详细信息,即模块和连接 −
Below are the details of the network, i.e., Modules and Connections −
C:\pybrain\pybrain>python
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pybrain.tools.shortcuts import buildNetwork
>>> network = buildNetwork(2,3,1)
>>> print(network)
FeedForwardNetwork-8
Modules:
[<BiasUnit 'bias'>, <LinearLayer 'in'>, <SigmoidLayer 'hidden0'>,
<LinearLay er 'out'>]
Connections:
[<FullConnection 'FullConnection-4': 'hidden0' -> 'out'>, <FullConnection 'F
ullConnection-5': 'in' -> 'hidden0'>, <FullConnection 'FullConnection-6': 'bias'
-< 'out'>, <FullConnection 'FullConnection-7': 'bias' -> 'hidden0'>]
>>>
模块由层组成,而连接由 FullConnection 对象组成。因此,每个模块和连接都按上面所示命名。
Modules consists of Layers, and Connection are made from FullConnection Objects. So each of the modules and connection are named as shown above.
PyBrain - Working with Datasets
数据集是输入数据,用于测试、验证和训练网络。要使用的数据集类型取决于我们要用机器学习执行的任务。在本章中,我们将研究以下内容 −
Datasets is an input data to be given to test, validate and train networks. The type of dataset to be used depends on the tasks that we are going to do with Machine Learning. In this chapter, we are going to take a look at the following −
-
Creating Dataset
-
Adding Data to Dataset
我们将首先学习如何创建数据集并使用给定的输入测试数据集。
We will first learn how to create a Dataset and test the dataset with the input given.
Creating Dataset
要创建数据集,我们需要使用 PyBrain 数据集包: pybrain.datasets 。
To create a dataset we need to use the pybrain dataset package: pybrain.datasets.
PyBrain 支持 SupervisedDataset 、SequentialDataset、ClassificationDataSet 等数据集类。我们将使用 SupervisedDataset 来创建数据集。要使用的数据集取决于用户尝试实现的机器学习任务。SupervisedDataset 是最简单的,我们将在本文中使用它。
Pybrain supports datasets classes like SupervisedDataset, SequentialDataset, ClassificationDataSet. We are going to make use of SupervisedDataset , to create our dataset.The dataset to be used depends on the machine learning task that user is trying to implement.SupervisedDataset is the simplest one and we are going to use the same over here.
SupervisedDataset dataset 需要参数 input 和 target。考虑一个 XOR 真值表,如下所示 −
A SupervisedDataset dataset needs params input and target. Consider an XOR truth table, as shown below −
A |
B |
A XOR B |
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
给定的输入像一个二维数组,我们得到一个输出。所以这里的输入变成大小,目标是输出 1。因此,输入数据集的大小将是 2、1。
The inputs that are given are like a 2-dimensional array and we get 1 output. So here the input becomes the size and the target it the output which is 1. So the inputs that will go for our dataset will 2,1.
createdataset.py
from pybrain.datasets import SupervisedDataSet
sds = SupervisedDataSet(2, 1)
print(sds)
这是当我们执行上面的代码 python createdataset.py 时得到的结果 −
This is what we get when we execute above code python createdataset.py −
C:\pybrain\pybrain\src>python createdataset.py
input: dim(0, 2)
[]
target: dim(0, 1)
[]
如下图所示,它显示大小为 2 的输入和大小为 1 的目标。
It displays the input of size 2 and target of size 1 as shown above.
Adding Data to Dataset
现在,我们向数据集添加示例数据。
Let us now add the sample data to the dataset.
createdataset.py
from pybrain.datasets import SupervisedDataSet
sds = SupervisedDataSet(2, 1)
xorModel = [
[(0,0), (0,)],
[(0,1), (1,)],
[(1,0), (1,)],
[(1,1), (0,)],
]
for input, target in xorModel:
sds.addSample(input, target)
print("Input is:")
print(sds['input'])
print("\nTarget is:")
print(sds['target'])
我们创建了一个 XORModel 数组,如下所示 −
We have created a XORModel array as shown below −
xorModel = [
[(0,0), (0,)],
[(0,1), (1,)],
[(1,0), (1,)],
[(1,1), (0,)],
]
要向数据集添加数据,我们使用 addSample() 方法,它输入 input 和 target。
To add data to the dataset, we are using addSample() method which takes in input and target.
以下所示,我们将遍历 xorModel 数组以将数据添加到 addSample 中 −
To add data to the addSample, we will loop through xorModel array as shown below −
for input, target in xorModel:
sds.addSample(input, target)
执行后,我们将获得以下输出 −
After executing, the following is the output we get −
python createdataset.py
C:\pybrain\pybrain\src>python createdataset.py
Input is:
[[0. 0.]
[0. 1.]
[1. 0.]
[1. 1.]]
Target is:
[[0.]
[1.]
[1.]
[0.]]
您可以使用输入和目标索引轻松获取由数据集创建的输入和目标详细信息,如下所示 −
You can get the input and target details from the dataset created by simply using the input and target index as shown below −
print(sds['input'])
print(sds[‘target’])
PyBrain - Datasets Types
数据集是指提供给网络进行测试、验证和训练的数据。要使用的数据集类型取决于我们要使用机器学习执行的任务。我们将在本章中讨论各种数据集类型。
Datasets are data to be given to test, validate and train on networks. The type of dataset to be used depends on the tasks that we are going to do with machine learning. We are going to discuss the various dataset types in this chapter.
我们可以通过添加以下包来使用数据集 −
We can work with the dataset by adding the following package −
pybrain.dataset
SupervisedDataSet
SupervisedDataSet 包含 input 和 target 字段。它是数据集最简单的形式,主要用于监督式学习任务。
SupervisedDataSet consists of fields of input and target. It is the simplest form of a dataset and mainly used for supervised learning tasks.
以下是您如何在代码中使用它的方法 −
Below is how you can use it in the code −
from pybrain.datasets import SupervisedDataSet
SupervisedDataSet 中可用的方法如下 −
The methods available on SupervisedDataSet are as follows −
splitWithProportion(proportion=0.10)
这将把数据集分为两部分。第一部分将占数据集输入部分的 %,即如果输入为 .10,那么它就是数据集的 10%,90% 的数据。您可以根据自己的选择决定比例。可以将已划分的数据集用于测试并训练您的网络。
This will divide the datasets into two parts. The first part will have the % of the dataset given as input, i.e., if the input is .10, then it is 10% of the dataset and 90% of data. You can decide the proportion as per your choice. The divided datasets can be used for testing and training your network.
copy() − 返回数据集的深度副本。
copy() − Returns a deep copy of the dataset.
clear() − 清除数据集。
clear() − Clear the dataset.
saveToFile(filename, format=None, **kwargs)
将对象保存到由 filename 给出的文件中。
Save the object to file given by filename.
Example
这里有一个使用 SupervisedDataset 的工作示例 −
Here is a working example using a SupervisedDataset −
testnetwork.py
testnetwork.py
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import TanhLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
# Create a network with two inputs, three hidden, and one output
nn = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
# Create a dataset that matches network input and output sizes:
norgate = SupervisedDataSet(2, 1)
# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
norgate.addSample((0, 0), (1,))
norgate.addSample((0, 1), (0,))
norgate.addSample((1, 0), (0,))
norgate.addSample((1, 1), (0,))
# Add input and target values to dataset
# Values for NOR truth table
nortrain.addSample((0, 0), (1,))
nortrain.addSample((0, 1), (0,))
nortrain.addSample((1, 0), (0,))
nortrain.addSample((1, 1), (0,))
#Training the network with dataset norgate.
trainer = BackpropTrainer(nn, norgate)
# will run the loop 1000 times to train it.
for epoch in range(1000):
trainer.train()
trainer.testOnData(dataset=nortrain, verbose = True)
Output
以上程序的输出如下所示 −
The output for the above program is as follows −
python testnetwork.py
python testnetwork.py
C:\pybrain\pybrain\src>python testnetwork.py
Testing on data:
('out: ', '[0.887 ]')
('correct:', '[1 ]')
error: 0.00637334
('out: ', '[0.149 ]')
('correct:', '[0 ]')
error: 0.01110338
('out: ', '[0.102 ]')
('correct:', '[0 ]')
error: 0.00522736
('out: ', '[-0.163]')
('correct:', '[0 ]')
error: 0.01328650
('All errors:', [0.006373344564625953, 0.01110338071737218, 0.005227359234093431
, 0.01328649974219942])
('Average error:', 0.008997646064572746)
('Max error:', 0.01328649974219942, 'Median error:', 0.01110338071737218)
ClassificationDataSet
此数据集主要用于解决分类问题。它使用输入、目标字段以及一个名为“class”的附加字段,它是所给目标的自动化备份。例如,输出将是 1 或 0,或根据所给输入,输出将与值分组在一起,即它将属于特定类。
This dataset is mainly used to deal with classification problems. It takes in input, target field and also an extra field called "class" which is an automated backup of the targets given. For example, the output will be either 1 or 0 or the output will be grouped together with values based on input given., i.e., it will fall in one particular class.
以下是您如何在代码中使用它的方法 −
Here is how you can use it in the code −
from pybrain.datasets import ClassificationDataSet
Syntax
// ClassificationDataSet(inp, target=1, nb_classes=0, class_labels=None)
ClassificationDataSet 可用方法如下:
The methods available on ClassificationDataSet are as follows −
addSample(inp, target) − 此方法将添加一个新的输入和目标样本。
addSample(inp, target) − This method will add a new sample of input and target.
splitByClass() − 此方法将给出两个新的数据集,第一个数据集将拥有选中的类(0..nClasses-1),第二个数据集将拥有剩余的样本。
splitByClass() − This method will give two new datasets, the first dataset will have the class selected (0..nClasses-1), the second one will have remaining samples.
_convertToOneOfMany() − 此方法将目标类转换为 1 中 k 个表示形式,将旧目标作为字段类保留
_convertToOneOfMany() − This method will convert the target classes to a 1-of-k representation, retaining the old targets as a field class
下面是 ClassificationDataSet 的一个工作示例。
Here is a working example of ClassificationDataSet.
Example
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
digits = datasets.load_digits()
X, y = digits.data, digits.target
ds = ClassificationDataSet(64, 1, nb_classes=10)
for i in range(len(X)):
ds.addSample(ravel(X[i]), y[i])
test_data_temp, training_data_temp = ds.splitWithProportion(0.25)
test_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, test_data_temp.getLength()):
test_data.addSample( test_data_temp.getSample(n)[0], test_data_temp.getSample(n)[1] )
training_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, training_data_temp.getLength()):
training_data.addSample( training_data_temp.getSample(n)[0], training_data_temp.getSample(n)[1] )
test_data._convertToOneOfMany()
training_data._convertToOneOfMany()
net = buildNetwork(training_data.indim, 64, training_data.outdim, outclass=SoftmaxLayer)
trainer = BackpropTrainer(
net, dataset=training_data, momentum=0.1,learningrate=0.01,verbose=True,weightdecay=0.01
)
trnerr,valerr = trainer.trainUntilConvergence(dataset=training_data,maxEpochs=10)
plt.plot(trnerr,'b',valerr,'r')
plt.show()
trainer.trainEpochs(10)
print('Percent Error on testData:',percentError(trainer.testOnClassData(dataset=test_data), test_data['class']))
以上示例中使用的数据集是数字数据集,类为 0-9,因此有 10 个类。输入为 64,目标为 1,类为 10。
The dataset used in the above example is a digit dataset and the classes are from 0-9, so there are 10 classes. The input is 64, target is 1 and classes, 10.
该代码使用数据集训练网络,并输出训练误差和验证误差的图形。它还给出测试数据的百分比误差,如下所示:
The code trains the network with the dataset and outputs the graph for training error and validation error. It also gives the percent error on testdata which is as follows −
Output
Total error: 0.0432857814358
Total error: 0.0222276374185
Total error: 0.0149012052174
Total error: 0.011876985318
Total error: 0.00939854792853
Total error: 0.00782202445183
Total error: 0.00714707652044
Total error: 0.00606068893793
Total error: 0.00544257958975
Total error: 0.00463929281336
Total error: 0.00441275665294
('train-errors:', '[0.043286 , 0.022228 , 0.014901 , 0.011877 , 0.009399 , 0.007
822 , 0.007147 , 0.006061 , 0.005443 , 0.004639 , 0.004413 ]')
('valid-errors:', '[0.074296 , 0.027332 , 0.016461 , 0.014298 , 0.012129 , 0.009
248 , 0.008922 , 0.007917 , 0.006547 , 0.005883 , 0.006572 , 0.005811 ]')
Percent Error on testData: 3.34075723830735
PyBrain - Importing Data For Datasets
在本章中,我们将学习如何获取使用 Pybrain 数据集的数据。
In this chapter, we will learn how to get data to work with Pybrain datasets.
最常用的数据集是:
The most commonly used are datasets are −
-
Using sklearn
-
From CSV file
Using sklearn
使用 sklearn
Using sklearn
以下是 sklearn 数据集详细信息的链接: https://scikit-learn.org/stable/datasets/toy_dataset.html
Here is the link that has details of datasets from sklearn:https://scikit-learn.org/stable/datasets/toy_dataset.html
以下是使用 sklearn 数据集的一些示例:
Here are a few examples of how to use datasets from sklearn −
From CSV file
我们还可以通过以下方式使用 csv 文件中的数据:
We can also use data from csv file as follows −
下面是异或真值表的样本数据:datasettest.csv
Here is sample data for xor truth table: datasettest.csv
以下是读取 csv 文件中数据以获取数据集的工作示例。
Here is the working example to read the data from .csv file for dataset.
Example
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import TanhLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
import pandas as pd
print('Read data...')
df = pd.read_csv('data/datasettest.csv',header=0).head(1000)
data = df.values
train_output = data[:,0]
train_data = data[:,1:]
print(train_output)
print(train_data)
# Create a network with two inputs, three hidden, and one output
nn = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
# Create a dataset that matches network input and output sizes:
_gate = SupervisedDataSet(2, 1)
# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
for i in range(0, len(train_output)) :
_gate.addSample(train_data[i], train_output[i])
#Training the network with dataset norgate.
trainer = BackpropTrainer(nn, _gate)
# will run the loop 1000 times to train it.
for epoch in range(1000):
trainer.train()
trainer.testOnData(dataset=_gate, verbose = True)
如示例所示,Panda 用于读取 csv 文件中的数据。
Panda is used to read data from csv file as shown in the example.
Output
C:\pybrain\pybrain\src>python testcsv.py
Read data...
[0 1 1 0]
[
[0 0]
[0 1]
[1 0]
[1 1]
]
Testing on data:
('out: ', '[0.004 ]')
('correct:', '[0 ]')
error: 0.00000795
('out: ', '[0.997 ]')
('correct:', '[1 ]')
error: 0.00000380
('out: ', '[0.996 ]')
('correct:', '[1 ]')
error: 0.00000826
('out: ', '[0.004 ]')
('correct:', '[0 ]')
error: 0.00000829
('All errors:', [7.94733477723902e-06, 3.798267582566822e-06, 8.260969076585322e
-06, 8.286246525558165e-06])
('Average error:', 7.073204490487332e-06)
('Max error:', 8.286246525558165e-06, 'Median error:', 8.260969076585322e-06)
PyBrain - Training Datasets on Networks
目前为止,我们已经了解如何创建网络和数据集。为了配合使用数据集和网络,我们必须借助培训器来实现。
So far, we have seen how to create a network and a dataset. To work with datasets and networks together, we have to do it with the help of trainers.
以下是一个工作示例,展示如何将数据集添加到创建的网络中,以及以后如何使用培训器对网络进行训练和测试。
Below is a working example to see how to add a dataset to the network created, and later trained and tested using trainers.
testnetwork.py
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import TanhLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
# Create a network with two inputs, three hidden, and one output
nn = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
# Create a dataset that matches network input and output sizes:
norgate = SupervisedDataSet(2, 1)
# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
norgate.addSample((0, 0), (1,))
norgate.addSample((0, 1), (0,))
norgate.addSample((1, 0), (0,))
norgate.addSample((1, 1), (0,))
# Add input and target values to dataset
# Values for NOR truth table
nortrain.addSample((0, 0), (1,))
nortrain.addSample((0, 1), (0,))
nortrain.addSample((1, 0), (0,))
nortrain.addSample((1, 1), (0,))
#Training the network with dataset norgate.
trainer = BackpropTrainer(nn, norgate)
# will run the loop 1000 times to train it.
for epoch in range(1000):
trainer.train()
trainer.testOnData(dataset=nortrain, verbose = True)
为了测试网络和数据集,我们需要 BackpropTrainer。BackpropTrainer 是一款培训器,根据受监督数据集(可能是顺序的)通过反向传播错误(随时间推移),培训模块的参数。
To test the network and dataset, we need BackpropTrainer. BackpropTrainer is a trainer that trains the parameters of a module according to a supervised dataset (potentially sequential) by backpropagating the errors (through time).
我们创建了 2 个 SupervisedDataSet 类数据集。我们正在利用如下 NOR 数据模型:
We have created 2 datasets of class - SupervisedDataSet. We are making use of NOR data model which is as follows −
A |
B |
A NOR B |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
上述数据模型用于训练网络。
The above data model is used to train the network.
norgate = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
norgate.addSample((0, 0), (1,))
norgate.addSample((0, 1), (0,))
norgate.addSample((1, 0), (0,))
norgate.addSample((1, 1), (0,))
以下是用于测试的数据集:
Following is the dataset used to test −
# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
norgate.addSample((0, 0), (1,))
norgate.addSample((0, 1), (0,))
norgate.addSample((1, 0), (0,))
norgate.addSample((1, 1), (0,))
训练器如下所示:
The trainer is used as follows −
#Training the network with dataset norgate.
trainer = BackpropTrainer(nn, norgate)
# will run the loop 1000 times to train it.
for epoch in range(1000):
trainer.train()
我们可以使用以下代码来对数据集进行测试:
To test on the dataset, we can use the below code −
trainer.testOnData(dataset=nortrain, verbose = True)
Output
python testnetwork.py
C:\pybrain\pybrain\src>python testnetwork.py
Testing on data:
('out: ', '[0.887 ]')
('correct:', '[1 ]')
error: 0.00637334
('out: ', '[0.149 ]')
('correct:', '[0 ]')
error: 0.01110338
('out: ', '[0.102 ]')
('correct:', '[0 ]')
error: 0.00522736
('out: ', '[-0.163]')
('correct:', '[0 ]')
error: 0.01328650
('All errors:', [0.006373344564625953, 0.01110338071737218, 0.005227359234093431
, 0.01328649974219942])
('Average error:', 0.008997646064572746)
('Max error:', 0.01328649974219942, 'Median error:', 0.01110338071737218)
如果您检查输出,则测试数据几乎与我们提供的数据集匹配,因此误差为 0.008。
If you check the output, the test data almost matches with the dataset we have provided and hence the error is 0.008.
我们现在更改测试数据并查看平均误差。我们已按如下所示更改输出:
Let us now change the test data and see an average error. We have changed the output as shown below −
以下是用于测试的数据集:
Following is the dataset used to test −
# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
norgate.addSample((0, 0), (0,))
norgate.addSample((0, 1), (1,))
norgate.addSample((1, 0), (1,))
norgate.addSample((1, 1), (0,))
我们现在来测试一下。
Let us now test it.
Output
python testnework.py
C:\pybrain\pybrain\src>python testnetwork.py
Testing on data:
('out: ', '[0.988 ]')
('correct:', '[0 ]')
error: 0.48842978
('out: ', '[0.027 ]')
('correct:', '[1 ]')
error: 0.47382097
('out: ', '[0.021 ]')
('correct:', '[1 ]')
error: 0.47876379
('out: ', '[-0.04 ]')
('correct:', '[0 ]')
error: 0.00079160
('All errors:', [0.4884297811030845, 0.47382096780393873, 0.47876378995939756, 0
.0007915982149002194])
('Average error:', 0.3604515342703303)
('Max error:', 0.4884297811030845, 'Median error:', 0.47876378995939756)
我们获得的错误为 0.36,这表明我们的测试数据与经过训练的网络并不完全匹配。
We are getting the error as 0.36, which shows that our test data is not completely matching with the network trained.
PyBrain - Testing Network
在本章中,我们将看到一些示例,在这些示例中我们将训练数据并测试训练数据上的错误。
In this chapter, we are going to see some example where we are going to train the data and test the errors on the trained data.
我们将使用训练器:
We are going to make use of trainers −
BackpropTrainer
BackpropTrainer 是按照有监督或 ClassificationDataSet 数据集(可能按顺序),通过反向传播错误(随时)来训练模块参数的训练器。
BackpropTrainer is trainer that trains the parameters of a module according to a supervised or ClassificationDataSet dataset (potentially sequential) by backpropagating the errors (through time).
TrainUntilConvergence
它用于训练模块,直到它收敛。
It is used to train the module on the dataset until it converges.
当我们创建一个神经网络时,它将根据给定的训练数据进行训练。现在,网络是否经过了正确的训练,取决于在该网络上测试的测试数据的预测。
When we create a neural network, it will get trained based on the training data given to it.Now whether the network is trained properly or not will depend on prediction of test data tested on that network.
让我们一步一步地看一个工作示例,其中我们将在构建一个神经网络并预测训练错误、测试错误和验证错误。
Let us see a working example step by step which where will build a neural network and predict the training errors, test errors and validation errors.
Testing our Network
以下是我们将遵循的用于测试我们网络的步骤:
Following are the steps we will follow for testing our Network −
-
Importing required PyBrain and other packages
-
Create ClassificationDataSet
-
Splitting the datasets 25% as testdata and 75% as trained data
-
Converting Testdata and Trained data back as ClassificationDataSet
-
Creating a Neural Network
-
Training the Network
-
Visualizing the error and validation data
-
Percentage for test data Error
Step 1
Step 1
导入所需的 PyBrain 和其他软件包。
Importing required PyBrain and other packages.
我们需要的软件包按以下方式导入:
The packages that we need are imported as shown below −
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
Step 2
Step 2
下一步是创建 ClassificationDataSet。
The next step is to create ClassificationDataSet.
对于数据集,我们将使用 sklearn 数据集中的数据集,如下所示:
For Datasets, we are going to use datasets from sklearn datasets as shown below −
在下面的链接中查看 sklearn 中的 load_digits 数据集:
Refer load_digits datasets from sklearn in the below link −
digits = datasets.load_digits()
X, y = digits.data, digits.target
ds = ClassificationDataSet(64, 1, nb_classes=10)
# we are having inputs are 64 dim array and since the digits are from 0-9 the
classes considered is 10.
for i in range(len(X)):
ds.addSample(ravel(X[i]), y[i]) # adding sample to datasets
Step 3
Step 3
将数据集拆分为 25% 作为测试数据和 75% 作为训练数据:
Splitting the datasets 25% as testdata and 75% as trained data −
test_data_temp, training_data_temp = ds.splitWithProportion(0.25)
所以在这里,我们在数据集中使用了一个叫做 splitWithProportion() 的方法,其值为 0.25,它将把数据集拆分为 25% 作为测试数据和 75% 作为训练数据。
So here, we have used a method on dataset called splitWithProportion() with value 0.25, it will split the dataset into 25% as test data and 75% as training data.
Step 4
Step 4
将测试数据和训练数据转换回 ClassificationDataSet。
Converting Testdata and Trained data back as ClassificationDataSet.
test_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, test_data_temp.getLength()):
test_data.addSample( test_data_temp.getSample(n)[0], test_data_temp.getSample(n)[1] )
training_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, training_data_temp.getLength()):
training_data.addSample(
training_data_temp.getSample(n)[0], training_data_temp.getSample(n)[1]
)
test_data._convertToOneOfMany()
training_data._convertToOneOfMany()
在数据集上使用 splitWithProportion() 方法会将数据集转换为 superviseddataset,因此我们将数据集转换回 classificationdataset,如上一步所示。
Using splitWithProportion() method on dataset converts the dataset to superviseddataset, so we will convert the dataset back to classificationdataset as shown in above step.
Step 5
Step 5
下一步是创建神经网络。
Next step is creating a Neural Network.
net = buildNetwork(training_data.indim, 64, training_data.outdim, outclass=SoftmaxLayer)
我们在其中创建了一个网络,该网络中的输入和输出是使用训练数据。
We are creating a network wherein the input and output are used from the training data.
Step 6
Step 6
Training the Network
现在,重要部分是对数据集上的网络进行训练,如下所示:
Now the important part is training the network on the dataset as shown below −
trainer = BackpropTrainer(net, dataset=training_data,
momentum=0.1,learningrate=0.01,verbose=True,weightdecay=0.01)
我们正在使用 BackpropTrainer() 方法并在创建的网络上使用数据集。
We are using BackpropTrainer() method and using dataset on the network created.
Step 7
Step 7
下一步是可视化数据的错误和验证。
The next step is visualizing the error and validation of the data.
trnerr,valerr = trainer.trainUntilConvergence(dataset=training_data,maxEpochs=10)
plt.plot(trnerr,'b',valerr,'r')
plt.show()
我们将在训练数据上使用一种名为 trainUntilConvergence 的方法,它将在 10 个 epoch 中收敛。它将返回训练误差和验证误差,我们已将它们绘制在下图中。蓝线显示训练误差,红线显示验证误差。
We will use a method called trainUntilConvergence on training data that will converge for epochs of 10. It will return training error and validation error which we have plotted as shown below. The blue line shows the training errors and red line shows the validation error.
在执行上述代码期间收到的总误差如下所示 −
Total error received during execution of the above code is shown below −
Total error: 0.0432857814358
Total error: 0.0222276374185
Total error: 0.0149012052174
Total error: 0.011876985318
Total error: 0.00939854792853
Total error: 0.00782202445183
Total error: 0.00714707652044
Total error: 0.00606068893793
Total error: 0.00544257958975
Total error: 0.00463929281336
Total error: 0.00441275665294
('train-errors:', '[0.043286 , 0.022228 , 0.014901 , 0.011877 , 0.009399 , 0.007
822 , 0.007147 , 0.006061 , 0.005443 , 0.004639 , 0.004413 ]')
('valid-errors:', '[0.074296 , 0.027332 , 0.016461 , 0.014298 , 0.012129 , 0.009
248 , 0.008922 , 0.007917 , 0.006547 , 0.005883 , 0.006572 , 0.005811 ]')
该误差从 0.04 开始,然后随着每个 epoch 的进行而减小,这意味着网络正在接受训练,并且每个 epoch 都会变得更好。
The error starts at 0.04 and later goes down for each epoch, which means the network is getting trained and gets better for each epoch.
Step 8
Step 8
Percentage for test data error
我们可以使用 percentError 方法检查误差百分比,如下所示 −
We can check the percent error using percentError method as shown below −
print('Percent Error on
testData:',percentError(trainer.testOnClassData(dataset=test_data),
test_data['class']))
Percent Error on testData − 3.34075723830735
Percent Error on testData − 3.34075723830735
我们正在获得误差百分比,即 3.34%,这意味着神经网络具有 97% 的准确性。
We are getting the error percent, i.e., 3.34%, which means the neural network is 97% accurate.
以下是完整代码 −
Below is the full code −
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
digits = datasets.load_digits()
X, y = digits.data, digits.target
ds = ClassificationDataSet(64, 1, nb_classes=10)
for i in range(len(X)):
ds.addSample(ravel(X[i]), y[i])
test_data_temp, training_data_temp = ds.splitWithProportion(0.25)
test_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, test_data_temp.getLength()):
test_data.addSample( test_data_temp.getSample(n)[0], test_data_temp.getSample(n)[1] )
training_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, training_data_temp.getLength()):
training_data.addSample(
training_data_temp.getSample(n)[0], training_data_temp.getSample(n)[1]
)
test_data._convertToOneOfMany()
training_data._convertToOneOfMany()
net = buildNetwork(training_data.indim, 64, training_data.outdim, outclass=SoftmaxLayer)
trainer = BackpropTrainer(
net, dataset=training_data, momentum=0.1,
learningrate=0.01,verbose=True,weightdecay=0.01
)
trnerr,valerr = trainer.trainUntilConvergence(dataset=training_data,maxEpochs=10)
plt.plot(trnerr,'b',valerr,'r')
plt.show()
trainer.trainEpochs(10)
print('Percent Error on testData:',percentError(
trainer.testOnClassData(dataset=test_data), test_data['class']
))
PyBrain - Working with Feed-Forward Networks
前馈网络是一种神经网络,其中节点之间的信息向前移动,并且永远不会向后传播。前馈网络是人工神经网络中可用网络中第一个也是最简单的网络。信息从输入节点传递到隐藏节点,然后传递到输出节点。
A feed-forward network is a neural network, where the information between nodes moves in the forward direction and will never travel backward. Feed Forward network is the first and the simplest one among the networks available in the artificial neural network. The information is passed from the input nodes, next to the hidden nodes and later to the output node.
在本章中,我们将讨论如何 −
In this chapter we are going to discuss how to −
-
Create Feed-Forward Networks
-
Add Connection and Modules to FFN
Creating a Feed Forward Network
您可以使用您选择的 python IDE,即 PyCharm。在此,我们使用 Visual Studio Code 编写代码,并将在终端中执行相同的代码。
You can use the python IDE of your choice, i.e., PyCharm. In this, we are using Visual Studio Code to write the code and will execute the same in terminal.
要创建一个前馈网络,我们需要从 pybrain.structure 导入它,如下所示 −
To create a feedforward network, we need to import it from pybrain.structure as shown below −
ffn.py
from pybrain.structure import FeedForwardNetwork
network = FeedForwardNetwork()
print(network)
按如下所示执行 ffn.py −
Execute ffn.py as shown below −
C:\pybrain\pybrain\src>python ffn.py
FeedForwardNetwork-0
Modules:
[]
Connections:
[]
我们尚未向前馈网络添加任何模块和连接。因此,网络显示模块和连接的空数组。
We have not added any modules and connections to the feedforward network. Hence the network shows empty arrays for Modules and Connections.
Adding Modules and Connections
我们将首先创建输入、隐藏、输出层,并将它们添加到模块中,如下所示 −
First we will create input, hidden, output layers and add the same to the modules as shown below −
ffy.py
from pybrain.structure import FeedForwardNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
network = FeedForwardNetwork()
#creating layer for input => 2 , hidden=> 3 and output=>1
inputLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outputLayer = LinearLayer(1)
#adding the layer to feedforward network
network.addInputModule(inputLayer)
network.addModule(hiddenLayer)
network.addOutputModule(outputLayer)
print(network)
Output
C:\pybrain\pybrain\src>python ffn.py
FeedForwardNetwork-3
Modules:
[]
Connections:
[]
模块和连接仍然为空。我们需要为创建的模块提供连接,如下所示 −
We are still getting the modules and connections as empty. We need to provide a connection to the modules created as shown below −
以下代码创建了输入、隐藏和输出层之间的连接,并将连接添加到网络中。
Here is the code where we have created a connection between input, hidden and output layers and add the connection to the network.
ffy.py
from pybrain.structure import FeedForwardNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.structure import FullConnection
network = FeedForwardNetwork()
#creating layer for input => 2 , hidden=> 3 and output=>1
inputLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outputLayer = LinearLayer(1)
#adding the layer to feedforward network
network.addInputModule(inputLayer)
network.addModule(hiddenLayer)
network.addOutputModule(outputLayer)
#Create connection between input ,hidden and output
input_to_hidden = FullConnection(inputLayer, hiddenLayer)
hidden_to_output = FullConnection(hiddenLayer, outputLayer)
#add connection to the network
network.addConnection(input_to_hidden)
network.addConnection(hidden_to_output)
print(network)
Output
C:\pybrain\pybrain\src>python ffn.py
FeedForwardNetwork-3
Modules:
[]
Connections:
[]
我们仍然无法获得模块和连接。现在让我们添加最后一步,即我们需要添加 sortModules() 方法,如下所示 −
We are still not able to get the modules and connections. Let us now add the final step, i.e., we need to add the sortModules() method as shown below −
ffy.py
from pybrain.structure import FeedForwardNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.structure import FullConnection
network = FeedForwardNetwork()
#creating layer for input => 2 , hidden=> 3 and output=>1
inputLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outputLayer = LinearLayer(1)
#adding the layer to feedforward network
network.addInputModule(inputLayer)
network.addModule(hiddenLayer)
network.addOutputModule(outputLayer)
#Create connection between input ,hidden and output
input_to_hidden = FullConnection(inputLayer, hiddenLayer)
hidden_to_output = FullConnection(hiddenLayer, outputLayer)
#add connection to the network
network.addConnection(input_to_hidden)
network.addConnection(hidden_to_output)
network.sortModules()
print(network)
Output
C:\pybrain\pybrain\src>python ffn.py
FeedForwardNetwork-6
Modules:
[<LinearLayer 'LinearLayer-3'gt;, <SigmoidLayer 'SigmoidLayer-7'>,
<LinearLayer 'LinearLayer-8'>]
Connections:
[<FullConnection 'FullConnection-4': 'SigmoidLayer-7' -> 'LinearLayer-8'>,
<FullConnection 'FullConnection-5': 'LinearLayer-3' -> 'SigmoidLayer-7'>]
我们现在可以看到feedforwardnetwork的模块和连接详细信息。
We are now able to see the modules and the connections details for feedforwardnetwork.
PyBrain - Working with Recurrent Networks
循环网络和前馈网络相同,其唯一的区别在于您需要记住在每个步骤中的数据。每个步骤的历史记录都必须保存。
Recurrent Networks is same as feed-forward network with only difference that you need to remember the data at each step.The history of each step has to be saved.
我们将学习如何:
We will learn how to −
-
Create a Recurrent Network
-
Adding Modules and Connection
Creating a Recurrent Network
创建循环网络,我们将使用RecurrentNetwork类,如下所示:
To create recurrent network, we will use RecurrentNetwork class as shown below −
rn.py
from pybrain.structure import RecurrentNetwork
recurrentn = RecurrentNetwork()
print(recurrentn)
python rn.py
C:\pybrain\pybrain\src>python rn.py
RecurrentNetwork-0
Modules:
[]
Connections:
[]
Recurrent Connections:
[]
我们可以看到循环网络的新连接称为循环连接。目前没有可用数据。
We can see a new connection called Recurrent Connections for the recurrent network. Right now there is no data available.
现在,让我们创建图层并添加到模块并创建连接。
Let us now create the layers and add to modules and create connections.
Adding Modules and Connection
我们将创建图层,即输入、隐藏和输出。这些图层将添加到输入和输出模块。接下来,我们将创建输入到隐藏、隐藏到输出和隐藏到隐藏之间的循环连接。
We are going to create layers, i.e., input, hidden and output. The layers will be added to the input and output module. Next, we will create the connection for input to hidden, hidden to output and a recurrent connection between hidden to hidden.
以下是具有模块和连接的循环网络的代码。
Here is the code for the Recurrent network with modules and connections.
rn.py
from pybrain.structure import RecurrentNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.structure import FullConnection
recurrentn = RecurrentNetwork()
#creating layer for input => 2 , hidden=> 3 and output=>1
inputLayer = LinearLayer(2, 'rn_in')
hiddenLayer = SigmoidLayer(3, 'rn_hidden')
outputLayer = LinearLayer(1, 'rn_output')
#adding the layer to feedforward network
recurrentn.addInputModule(inputLayer)
recurrentn.addModule(hiddenLayer)
recurrentn.addOutputModule(outputLayer)
#Create connection between input ,hidden and output
input_to_hidden = FullConnection(inputLayer, hiddenLayer)
hidden_to_output = FullConnection(hiddenLayer, outputLayer)
hidden_to_hidden = FullConnection(hiddenLayer, hiddenLayer)
#add connection to the network
recurrentn.addConnection(input_to_hidden)
recurrentn.addConnection(hidden_to_output)
recurrentn.addRecurrentConnection(hidden_to_hidden)
recurrentn.sortModules()
print(recurrentn)
python rn.py
C:\pybrain\pybrain\src>python rn.py
RecurrentNetwork-6
Modules:
[<LinearLayer 'rn_in'>, <SigmoidLayer 'rn_hidden'>,
<LinearLayer 'rn_output'>]
Connections:
[<FullConnection 'FullConnection-4': 'rn_hidden' -> 'rn_output'>,
<FullConnection 'FullConnection-5': 'rn_in' -> 'rn_hidden'>]
Recurrent Connections:
[<FullConnection 'FullConnection-3': 'rn_hidden' -> 'rn_hidden'>]
在以上输出中,我们可以看到模块、连接和循环连接。
In above ouput we can see the Modules, Connections and Recurrent Connections.
现在让我们使用activate方法激活该网络,如下所示:
Let us now activate the network using activate method as shown below −
Training Network Using Optimization Algorithms
我们已经看到了如何使用pybrain中的训练器训练网络。在本章中,将使用Pybrain提供的优化算法来训练网络。
We have seen how to train a network using trainers in pybrain. In this chapter, will use optimization algorithms available with Pybrain to train a network.
在示例中,我们将使用GA优化算法,需要导入如下所示:
In the example, we will use the GA optimization algorithm which needs to be imported as shown below −
from pybrain.optimization.populationbased.ga import GA
Example
下面是一个使用GA优化算法训练网络的工作示例:
Below is a working example of a training network using a GA optimization algorithm −
from pybrain.datasets.classification import ClassificationDataSet
from pybrain.optimization.populationbased.ga import GA
from pybrain.tools.shortcuts import buildNetwork
# create XOR dataset
ds = ClassificationDataSet(2)
ds.addSample([0., 0.], [0.])
ds.addSample([0., 1.], [1.])
ds.addSample([1., 0.], [1.])
ds.addSample([1., 1.], [0.])
ds.setField('class', [ [0.],[1.],[1.],[0.]])
net = buildNetwork(2, 3, 1)
ga = GA(ds.evaluateModuleMSE, net, minimize=True)
for i in range(100):
net = ga.learn(0)[0]
print(net.activate([0,0]))
print(net.activate([1,0]))
print(net.activate([0,1]))
print(net.activate([1,1]))
PyBrain - Layers
图层基本上是用于网络隐藏层的一组函数。
Layers are basically a set of functions that are used on hidden layers of a network.
我们将在本章中了解图层的以下详细信息:
We will go through the following details about layers in this chapter −
-
Understanding layer
-
Creating Layer using Pybrain
Understanding layers
我们之前已经看到使用图层的示例,如下所示:
We have seen examples earlier where we have used layers as follows −
-
TanhLayer
-
SoftmaxLayer
Example using TanhLayer
下面是一个我们使用TanhLayer构建网络的示例:
Below is one example where we have used TanhLayer for building a network −
testnetwork.py
testnetwork.py
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import TanhLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
# Create a network with two inputs, three hidden, and one output
nn = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
# Create a dataset that matches network input and output sizes:
norgate = SupervisedDataSet(2, 1)
# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
norgate.addSample((0, 0), (1,))
norgate.addSample((0, 1), (0,))
norgate.addSample((1, 0), (0,))
norgate.addSample((1, 1), (0,))
# Add input and target values to dataset
# Values for NOR truth table
nortrain.addSample((0, 0), (1,))
nortrain.addSample((0, 1), (0,))
nortrain.addSample((1, 0), (0,))
nortrain.addSample((1, 1), (0,))
#Training the network with dataset norgate.
trainer = BackpropTrainer(nn, norgate)
# will run the loop 1000 times to train it.
for epoch in range(1000):
trainer.train()
trainer.testOnData(dataset=nortrain, verbose = True)
Output
以上代码的输出如下 −
The output for the above code is as follows −
python testnetwork.py
python testnetwork.py
C:\pybrain\pybrain\src>python testnetwork.py
Testing on data:
('out: ', '[0.887 ]')
('correct:', '[1 ]')
error: 0.00637334
('out: ', '[0.149 ]')
('correct:', '[0 ]')
error: 0.01110338
('out: ', '[0.102 ]')
('correct:', '[0 ]')
error: 0.00522736
('out: ', '[-0.163]')
('correct:', '[0 ]')
error: 0.01328650
('All errors:', [0.006373344564625953, 0.01110338071737218,
0.005227359234093431, 0.01328649974219942])
('Average error:', 0.008997646064572746)
('Max error:', 0.01328649974219942, 'Median error:', 0.01110338071737218)
Example using SoftMaxLayer
下面是一个我们使用 SoftmaxLayer 构建网络的示例:
Below is one example where we have used SoftmaxLayer for building a network −
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure.modules import SoftmaxLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
# Create a network with two inputs, three hidden, and one output
nn = buildNetwork(2, 3, 1, bias=True, hiddenclass=SoftmaxLayer)
# Create a dataset that matches network input and output sizes:
norgate = SupervisedDataSet(2, 1)
# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
norgate.addSample((0, 0), (1,))
norgate.addSample((0, 1), (0,))
norgate.addSample((1, 0), (0,))
norgate.addSample((1, 1), (0,))
# Add input and target values to dataset
# Values for NOR truth table
nortrain.addSample((0, 0), (1,))
nortrain.addSample((0, 1), (0,))
nortrain.addSample((1, 0), (0,))
nortrain.addSample((1, 1), (0,))
#Training the network with dataset norgate.
trainer = BackpropTrainer(nn, norgate)
# will run the loop 1000 times to train it.
for epoch in range(1000):
trainer.train()
trainer.testOnData(dataset=nortrain, verbose = True)
Output
输出如下 −
The output is as follows −
C:\pybrain\pybrain\src>python example16.py
Testing on data:
('out: ', '[0.918 ]')
('correct:', '[1 ]')
error: 0.00333524
('out: ', '[0.082 ]')
('correct:', '[0 ]')
error: 0.00333484
('out: ', '[0.078 ]')
('correct:', '[0 ]')
error: 0.00303433
('out: ', '[-0.082]')
('correct:', '[0 ]')
error: 0.00340005
('All errors:', [0.0033352368788838365, 0.003334842961037291,
0.003034328685718761, 0.0034000458892589056])
('Average error:', 0.0032761136037246985)
('Max error:', 0.0034000458892589056, 'Median error:', 0.0033352368788838365)
Creating Layer in Pybrain
在 Pybrain 中,您可以按照如下方式创建自己的层:
In Pybrain, you can create your own layer as follows −
要创建层,您需要使用 NeuronLayer class 作为基类来创建所有类型的层。
To create a layer, you need to use NeuronLayer class as the base class to create all type of layers.
Example
from pybrain.structure.modules.neuronlayer import NeuronLayer
class LinearLayer(NeuronLayer):
def _forwardImplementation(self, inbuf, outbuf):
outbuf[:] = inbuf
def _backwardImplementation(self, outerr, inerr, outbuf, inbuf):
inerr[:] = outer
要创建层,我们需要实现两种方法:_forwardImplementation() 和 _backwardImplementation()。
To create a Layer, we need to implement two methods: _forwardImplementation() and _backwardImplementation().
The _forwardImplementation() takes in 2 arguments inbuf 和 outbuf,它们是 Scipy 数组。其大小取决于层的输入和输出维度。
The _forwardImplementation() takes in 2 arguments inbuf and outbuf, which are Scipy arrays. Its size is dependent on the layers’ input and output dimensions.
_backwardImplementation() 用于计算输出相对于给定输入的导数。
The _backwardImplementation() is used to calculate the derivative of the output with respect to the input given.
因此,要在 Pybrain 中实现一个层,这个层类的框架就是:
So to implement a layer in Pybrain, this is the skeleton of the layer class −
from pybrain.structure.modules.neuronlayer import NeuronLayer
class NewLayer(NeuronLayer):
def _forwardImplementation(self, inbuf, outbuf):
pass
def _backwardImplementation(self, outerr, inerr, outbuf, inbuf):
pass
如果您想实现一个二次多项式函数作为层,我们可以按照如下方式进行:
In case you want to implement a quadratic polynomial function as a layer, we can do so as follows −
考虑我们有一个多项式函数:
Consider we have a polynomial function as −
f(x) = 3x2
以上多项式函数的导数为:
The derivative of the above polynomial function will be as follows −
f(x) = 6 x
以上多项式函数的最终层类为:
The final layer class for the above polynomial function will be as follows −
testlayer.py
testlayer.py
from pybrain.structure.modules.neuronlayer import NeuronLayer
class PolynomialLayer(NeuronLayer):
def _forwardImplementation(self, inbuf, outbuf):
outbuf[:] = 3*inbuf**2
def _backwardImplementation(self, outerr, inerr, outbuf, inbuf):
inerr[:] = 6*inbuf*outerr
现在让我们利用创建的层,如下所示:
Now let us make use of the layer created as shown below −
testlayer1.py
testlayer1.py
from testlayer import PolynomialLayer
from pybrain.tools.shortcuts import buildNetwork
from pybrain.tests.helpers import gradientCheck
n = buildNetwork(2, 3, 1, hiddenclass=PolynomialLayer)
n.randomize()
gradientCheck(n)
GradientCheck() 将测试层运行是否良好。我们需要将层使用到的网络传递到 gradientCheck(n)。如果层运行良好,它将输出“Perfect Gradient”。
GradientCheck() will test whether the layer is working fine or not.We need to pass the network where the layer is used to gradientCheck(n).It will give the output as “Perfect Gradient” if the layer is working fine.
PyBrain - Connections
连接类似于层;唯一的不同是,它在网络中将数据从一个节点转移到另一个节点。
A connection works similar to a layer; an only difference is that it shifts the data from one node to the other in a network.
在此章节,我们将学习:
In this chapter, we are going to learn about −
-
Understanding Connections
-
Creating Connections
Understanding Connections
下面是一个在创建网络时使用连接的工作示例。
Here is a working example of connections used while creating a network.
Example
ffy.py
ffy.py
from pybrain.structure import FeedForwardNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.structure import FullConnection
network = FeedForwardNetwork()
#creating layer for input => 2 , hidden=> 3 and output=>1
inputLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outputLayer = LinearLayer(1)
#adding the layer to feedforward network
network.addInputModule(inputLayer)
network.addModule(hiddenLayer)
network.addOutputModule(outputLayer)
#Create connection between input ,hidden and output
input_to_hidden = FullConnection(inputLayer, hiddenLayer)
hidden_to_output = FullConnection(hiddenLayer, outputLayer)
#add connection to the network
network.addConnection(input_to_hidden)
network.addConnection(hidden_to_output)
network.sortModules()
print(network)
Output
C:\pybrain\pybrain\src>python ffn.py
FeedForwardNetwork-6
Modules:
[<LinearLayer 'LinearLayer-3'>, <SigmoidLayer 'SigmoidLayer-7'>,
<LinearLayer 'LinearLayer-8'>]
Connections:
[<FullConnection 'FullConnection-4': 'SigmoidLayer-7' -> 'LinearLayer-8'>,
<FullConnection 'FullConnection-5': 'LinearLayer-3' -> 'SigmoidLayer-7'>]
Creating Connections
在 Pybrain 中,我们可以使用如下所示的连接模块来创建连接:
In Pybrain, we can create connections by using the connection module as shown below −
Example
connect.py
connect.py
from pybrain.structure.connections.connection import Connection
class YourConnection(Connection):
def __init__(self, *args, **kwargs):
Connection.__init__(self, *args, **kwargs)
def _forwardImplementation(self, inbuf, outbuf):
outbuf += inbuf
def _backwardImplementation(self, outerr, inerr, inbuf):
inerr += outer
要创建一个连接,有 2 种方法 — _forwardImplementation() 和 _backwardImplementation()。
To create a connection, there are 2 methods — _forwardImplementation() and _backwardImplementation().
_forwardImplementation() 在输入模块的输出缓冲器(即 inbuf)和输出模块的输入缓冲器(即 outbuf)中调用。inbuf 被添加到输出模块 outbuf。
The _forwardImplementation() is called with the output buffer of the incoming module which is inbuf, and the input buffer of the outgoing module called outbuf. The inbuf is added to the outgoing module outbuf.
_backwardImplementation() 在 outerr、inerr 和 inbuf 中调用。输出模块错误在 _backwardImplementation() 中添加到输入模块错误中。
The _backwardImplementation() is called with outerr, inerr, and inbuf. The outgoing module error is added to the incoming module error in _backwardImplementation().
现在让我们在网络中使用 YourConnection 。
Let us now use the YourConnection in a network.
testconnection.py
testconnection.py
from pybrain.structure import FeedForwardNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
from connect import YourConnection
network = FeedForwardNetwork()
#creating layer for input => 2 , hidden=> 3 and output=>1
inputLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outputLayer = LinearLayer(1)
#adding the layer to feedforward network
network.addInputModule(inputLayer)
network.addModule(hiddenLayer)
network.addOutputModule(outputLayer)
#Create connection between input ,hidden and output
input_to_hidden = YourConnection(inputLayer, hiddenLayer)
hidden_to_output = YourConnection(hiddenLayer, outputLayer)
#add connection to the network
network.addConnection(input_to_hidden)
network.addConnection(hidden_to_output)
network.sortModules()
print(network)
Output
C:\pybrain\pybrain\src>python testconnection.py
FeedForwardNetwork-6
Modules:
[<LinearLayer 'LinearLayer-3'>, <SigmoidLayer 'SigmoidLayer-7'>,
<LinearLayer 'LinearLayer-8'>]
Connections:
[<YourConnection 'YourConnection-4': 'LinearLayer-3' -> 'SigmoidLayer-7'>,
<YourConnection 'YourConnection-5': 'SigmoidLayer-7' -> 'LinearLayer-8'>]
PyBrain - Reinforcement Learning Module
强化学习(RL)是机器学习中一个重要部分。强化学习根据来自环境的输入使智能体学习自己的行为。
Reinforcement Learning (RL) is an important part in Machine Learning. Reinforcement learning makes the agent learn its behaviour based on inputs from the environment.
强化过程中相互交互的组件如下:
The components that interact with each other during Reinforcement are as follows −
-
Environment
-
Agent
-
Task
-
Experiment
强化学习的布局如下:
The layout of Reinforcement Learning is given below −
在 RL 中,智能体以迭代方式与环境对话。在每个迭代中,智能体接收具有奖励的观察结果。然后它选择动作并将其发送到环境中。环境在每次迭代中移动到新的状态,并且每次收到的奖励都会被保存。
In RL, the agent talks with the environment in iteration. At each iteration, the agent receives an observation which has the reward. It then chooses the action and sends to the environment. The environment at each iteration moves to a new state and the reward received each time is saved.
RL 智能体的目标是尽可能多地收集奖励。在迭代之间,智能体的表现与表现优良的智能体进行比较,并且表现差异会引起奖励或失败。RL 主要用于问题解决任务,如机器人控制、电梯、电信、游戏等。
The goal of RL agent is to collect as many rewards as possible. In between the iteration the agent’s performance is compared with that of the agent that acts in a good way and the difference in performance gives rise to either reward or failure. RL is basically used in problem solving tasks like robot control, elevator, telecommunications, games etc.
让我们看看如何在 Pybrain 中使用 RL。
Let us take a look at how to work with RL in Pybrain.
我们将处理迷宫 environment ,它将使用 2 维 numpy 数组表示,其中 1 是墙壁,0 是自由场。智能体的责任是在自由场上移动并找到目标点。
We are going to work on maze environment which will be represented using 2 dimensional numpy array where 1 is a wall and 0 is a free field. The agent’s responsibility is to move over the free field and find the goal point.
以下是处理迷宫环境的分步流程。
Here is a step by step flow of working with maze environment.
Step 1
使用以下代码导入我们需要的包:
Import the packages we need with the below code −
from scipy import *
import sys, time
import matplotlib.pyplot as pylab # for visualization we are using mathplotlib
from pybrain.rl.environments.mazes import Maze, MDPMazeTask
from pybrain.rl.learners.valuebased import ActionValueTable
from pybrain.rl.agents import LearningAgent
from pybrain.rl.learners import Q, QLambda, SARSA #@UnusedImport
from pybrain.rl.explorers import BoltzmannExplorer #@UnusedImport
from pybrain.rl.experiments import Experiment
from pybrain.rl.environments import Task
Step 2
使用以下代码创建迷宫环境:
Create the maze environment using the below code −
# create the maze with walls as 1 and 0 is a free field
mazearray = array(
[[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1]]
)
env = Maze(mazearray, (7, 7)) # create the environment, the first parameter is the
maze array and second one is the goal field tuple
Step 3
下一步是创建智能体。
The next step is to create Agent.
智能体在 RL 中起着重要作用。它将使用 getAction() 和 integrateObservation() 方法与迷宫环境进行交互。
Agent plays an important role in RL. It will interact with the maze environment using getAction() and integrateObservation() methods.
代理有一个控制器(负责将状态映射到动作)和一个学习器。
The agent has a controller (which will map the states to actions) and a learner.
PyBrain 中的控制器像一个模块,输入的是状态,输出的是动作。
The controller in PyBrain is like a module, for which the input is states and convert them into actions.
controller = ActionValueTable(81, 4)
controller.initialize(1.)
ActionValueTable 需要 2 个输入,即:状态和动作的数量。标准迷宫环境有 4 个动作:北、南、东、西。
The ActionValueTable needs 2 inputs, i.e., the number of states and actions. The standard maze environment has 4 actions: north, south, east, west.
现在我们要创建一个学习器。我们将使用 SARSA() 学习算法,让学习器与代理一起使用。
Now we will create a learner. We are going to use SARSA() learning algorithm for the learner to be used with the agent.
learner = SARSA()
agent = LearningAgent(controller, learner)
Step 4
这一步是将代理添加到环境中。
This step is adding Agent to Environment.
要将代理连接到环境中,我们需要一个特殊组件,称为任务。 task 的作用是在环境中寻找目标,以及代理如何通过动作获得奖励。
To connect the agent to environment, we need a special component called task. The role of a task is to look for the goal in the environment and how the agent gets rewards for actions.
环境有它自己的任务。我们使用的迷宫环境有 MDPMazeTask 任务。MDP 指的是 “markov decision process” ,意思是代理知道自己在迷宫中的位置。环境将成为任务的参数。
The environment has its own task. The Maze environment that we have used has MDPMazeTask task. MDP stands for “markov decision process” which means, the agent knows its position in the maze. The environment will be a parameter to the task.
task = MDPMazeTask(env)
Step 5
在将代理添加到环境中后,下一步是创建实验。
The next step after adding agent to environment is to create an Experiment.
现在我们需要创建实验,以便任务和代理相互协调。
Now we need to create the experiment, so that we can have the task and the agent co-ordinate with each other.
experiment = Experiment(task, agent)
现在我们将运行 1000 次实验,如下所示:
Now we are going to run the experiment 1000 times as shown below −
for i in range(1000):
experiment.doInteractions(100)
agent.learn()
agent.reset()
当执行以下代码时,环境将在代理和任务之间运行 100 次:
The environment will run for 100 times between the agent and task when the following code gets executed −
experiment.doInteractions(100)
在每次迭代之后,它会将一个新状态返回给任务,由任务决定将哪些信息和奖励传递给代理。我们准备在 for 循环中学习并重新设置代理之后绘制一张新表格。
After each iteration, it gives back a new state to the task which decides what information and reward should be passed to the agent. We are going to plot a new table after learning and resetting the agent inside the for loop.
for i in range(1000):
experiment.doInteractions(100)
agent.learn()
agent.reset()
pylab.pcolor(table.params.reshape(81,4).max(1).reshape(9,9))
pylab.savefig("test.png")
以下是完整代码:
Here is the full code −
Example
maze.py
maze.py
from scipy import *
import sys, time
import matplotlib.pyplot as pylab
from pybrain.rl.environments.mazes import Maze, MDPMazeTask
from pybrain.rl.learners.valuebased import ActionValueTable
from pybrain.rl.agents import LearningAgent
from pybrain.rl.learners import Q, QLambda, SARSA #@UnusedImport
from pybrain.rl.explorers import BoltzmannExplorer #@UnusedImport
from pybrain.rl.experiments import Experiment
from pybrain.rl.environments import Task
# create maze array
mazearray = array(
[[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1]]
)
env = Maze(mazearray, (7, 7))
# create task
task = MDPMazeTask(env)
#controller in PyBrain is like a module, for which the input is states and
convert them into actions.
controller = ActionValueTable(81, 4)
controller.initialize(1.)
# create agent with controller and learner - using SARSA()
learner = SARSA()
# create agent
agent = LearningAgent(controller, learner)
# create experiment
experiment = Experiment(task, agent)
# prepare plotting
pylab.gray()
pylab.ion()
for i in range(1000):
experiment.doInteractions(100)
agent.learn()
agent.reset()
pylab.pcolor(controller.params.reshape(81,4).max(1).reshape(9,9))
pylab.savefig("test.png")
PyBrain - API & Tools
现在我们知道如何构建网络并训练它了。在本章中,我们将了解如何创建和保存网络,以及在需要时使用网络。
Now we know how to build a network and train it. In this chapter, we will understand how to create and save the network, and use the network whenever required.
Save and Recover Network
我们将使用 Pybrain 工具中的 NetworkWriter 和 NetworkReader,即:pybrain.tools.customxml。
We are going to make use of NetworkWriter and NetworkReader from Pybrain tool, i.e., pybrain.tools.customxml.
以下是一个实际示例:
Here is a working example of the same −
from pybrain.tools.shortcuts import buildNetwork
from pybrain.tools.customxml import NetworkWriter
from pybrain.tools.customxml import NetworkReader
net = buildNetwork(2,1,1)
NetworkWriter.writeToFile(net, 'network.xml')
net = NetworkReader.readFrom('network.xml')
网络保存在 network.xml 中。
The network is saved inside network.xml.
NetworkWriter.writeToFile(net, 'network.xml')
要读取所需的 xml,我们可以使用以下代码:
To read the xml when required we can use code as follows −
net = NetworkReader.readFrom('network.xml')
下面是创建的 network.xml 文件:
Here is the network.xml file created −
<?xml version="1.0" ?>
<PyBrain>
<Network class="pybrain.structure.networks.feedforward.FeedForwardNetwork" name="FeedForwardNetwork-8">
<name val="'FeedForwardNetwork-8'"/>
<Modules>
<LinearLayer class="pybrain.structure.modules.linearlayer.LinearLayer" inmodule="True" name="in">
<name val="'in'"/>
<dim val="2"/>
</LinearLayer>
<LinearLayer class="pybrain.structure.modules.linearlayer.LinearLayer" name="out" outmodule="True">
<name val="'out'"/>
<dim val="1"/>
</LinearLayer>
<BiasUnit class="pybrain.structure.modules.biasunit.BiasUnit" name="bias">
<name val="'bias'"/>
</BiasUnit>
<SigmoidLayer class="pybrain.structure.modules.sigmoidlayer.SigmoidLayer" name="hidden0">
<name val="'hidden0'"/>
<dim val="1"/>
</SigmoidLayer>
</Modules>
<Connections>
<FullConnection class="pybrain.structure.connections.full.FullConnection" name="FullConnection-6">
<inmod val="bias"/>
<outmod val="out"/>
<Parameters>[1.2441093186965146]</Parameters>
</FullConnection>
<FullConnection class="pybrain.structure.connections.full.FullConnection" name="FullConnection-7">
<inmod val="bias"/>
<outmod val="hidden0"/>
<Parameters>[-1.5743530012126412]</Parameters>
</FullConnection>
<FullConnection class="pybrain.structure.connections.full.FullConnection" name="FullConnection-4">
<inmod val="in"/>
<outmod val="hidden0"/>
<Parameters>[-0.9429546042034236, -0.09858196752687162]</Parameters>
</FullConnection>
<FullConnection class="pybrain.structure.connections.full.FullConnection" name="FullConnection-5">
<inmod val="hidden0"/>
<outmod val="out"/>
<Parameters>[-0.29205472354634304]</Parameters>
</FullConnection>
</Connections>
</Network>
</PyBrain>
API
以下是本教程中使用的 API 列表。
Below is a list of APIs that we have used throughout this tutorial.
For Networks
-
activate(input) − It takes parameter, i.e., the value to be tested. It will return back the result based on the input given.
-
activateOnDataset(dataset) − It will iterate over the dataset given and return the output.
-
addConnection(c) − Adds connection to the network.
-
addInputModule(m) − Adds the module given to the network and mark it as an input module.
-
addModule(m) − Adds the given module to the network.
-
addOutputModule(m) − Adds the module to the network and mark it as an output module.
-
reset() − Resets the modules and the network.
-
sortModules() − It prepares the network for activation by sorting internally. It has to be called before activation.
For Supervised Datasets
-
addSample(inp, target) − Adds a new sample of input and target.
-
splitWithProportion(proportion=0.5) − Divides dataset into two parts, the first part containing the proportion part data and the next set containing the remaining one.
For Trainers
trainUntilConvergence(dataset=None, maxEpochs=None, verbose=None, continueEpochs=10, validationProportion=0.25) - 用于在数据集上训练模块直到它收敛。如果没有给出数据集,它将尝试训练在开始时使用的已训练数据集。
trainUntilConvergence(dataset=None, maxEpochs=None, verbose=None, continueEpochs=10, validationProportion=0.25) − It is used to train the module on the dataset until it converges. If dataset is not given, it will try to train on the trained dataset used at the start.
PyBrain - Examples
在本章中,列出了使用 PyBrain 执行的所有可能的示例。
In this chapter, all possible examples which are executed using PyBrain are listed.
Example 1
使用 NOR 真值表并对其进行正确性测试。
Working with NOR Truth Table and testing it for correctness.
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import TanhLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
# Create a network with two inputs, three hidden, and one output
nn = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
# Create a dataset that matches network input and output sizes:
norgate = SupervisedDataSet(2, 1)
# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)
# Add input and target values to dataset
# Values for NOR truth table
norgate.addSample((0, 0), (1,))
norgate.addSample((0, 1), (0,))
norgate.addSample((1, 0), (0,))
norgate.addSample((1, 1), (0,))
# Add input and target values to dataset
# Values for NOR truth table
nortrain.addSample((0, 0), (1,))
nortrain.addSample((0, 1), (0,))
nortrain.addSample((1, 0), (0,))
nortrain.addSample((1, 1), (0,))
#Training the network with dataset norgate.
trainer = BackpropTrainer(nn, norgate)
# will run the loop 1000 times to train it.
for epoch in range(1000):
trainer.train()
trainer.testOnData(dataset=nortrain, verbose = True)
Output
C:\pybrain\pybrain\src>python testnetwork.py
Testing on data:
('out: ', '[0.887 ]')
('correct:', '[1 ]')
error: 0.00637334
('out: ', '[0.149 ]')
('correct:', '[0 ]')
error: 0.01110338
('out: ', '[0.102 ]')
('correct:', '[0 ]')
error: 0.00522736
('out: ', '[-0.163]')
('correct:', '[0 ]')
error: 0.01328650
('All errors:', [0.006373344564625953, 0.01110338071737218,
0.005227359234093431, 0.01328649974219942])
('Average error:', 0.008997646064572746)
('Max error:', 0.01328649974219942, 'Median error:', 0.01110338071737218)
Example 2
对于数据集,我们将使用来自 sklearn 数据集的数据集,如下所示:请参阅 sklearn 的 load_digits 数据集: scikit-learn.org
For Datasets, we are going to use datasets from sklearn datasets as shown below: Refer load_digits datasets from sklearn: scikit-learn.org
它有 10 类,即从 0 到 9 要预测的数字。
It has 10 classes, i.e., digits to be predicted from 0-9.
输入数据的总量为 64。
The total input data in X is 64.
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
digits = datasets.load_digits()
X, y = digits.data, digits.target
ds = ClassificationDataSet(64, 1, nb_classes=10) )
# we are having inputs are 64 dim array and since the digits are from 0-9
the classes considered is 10.
for i in range(len(X)):
ds.addSample(ravel(X[i]), y[i]) # adding sample to datasets
test_data_temp, training_data_temp = ds.splitWithProportion(0.25)
#Splitting the datasets 25% as testdata and 75% as trained data
# Using splitWithProportion() method on dataset converts the dataset to
#superviseddataset, so we will convert the dataset back to classificationdataset
#as shown in above step.
test_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, test_data_temp.getLength()):
test_data.addSample( test_data_temp.getSample(n)[0], test_data_temp.getSample(n)[1] )
training_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, training_data_temp.getLength()):
training_data.addSample(
training_data_temp.getSample(n)[0], training_data_temp.getSample(n)[1]
)
test_data._convertToOneOfMany()
training_data._convertToOneOfMany()
net = buildNetwork(
training_data.indim, 64, training_data.outdim, outclass=SoftmaxLayer
)
#creating a network wherein the input and output are used from the training data.
trainer = BackpropTrainer(
net, dataset=training_data, momentum=0.1,learningrate=0.01,verbose=True,weightdecay=0.01
)
#Training the Network
trnerr,valerr = trainer.trainUntilConvergence(dataset=training_data,maxEpochs=10)
#Visualizing the error and validation data
plt.plot(trnerr,'b',valerr,'r')
plt.show()
trainer.trainEpochs(10)
print('Percent Error on testData:',percentError(
trainer.testOnClassData(dataset=test_data), test_data['class']
))
Output
Total error: 0.0432857814358
Total error: 0.0222276374185
Total error: 0.0149012052174
Total error: 0.011876985318
Total error: 0.00939854792853
Total error: 0.00782202445183
Total error: 0.00714707652044
Total error: 0.00606068893793
Total error: 0.00544257958975
Total error: 0.00463929281336
Total error: 0.00441275665294
('train-errors:', '[0.043286 , 0.022228 , 0.014901 , 0.011877 , 0.009399 , 0.007
822 , 0.007147 , 0.006061 , 0.005443 , 0.004639 , 0.004413 ]')
('valid-errors:', '[0.074296 , 0.027332 , 0.016461 , 0.014298 , 0.012129 , 0.009
248 , 0.008922 , 0.007917 , 0.006547 , 0.005883 , 0.006572 , 0.005811 ]')
Percent Error on testData: 3.34075723830735