Pybrain 简明教程
PyBrain - Working with Datasets
数据集是输入数据,用于测试、验证和训练网络。要使用的数据集类型取决于我们要用机器学习执行的任务。在本章中,我们将研究以下内容 −
Datasets is an input data to be given to test, validate and train networks. The type of dataset to be used depends on the tasks that we are going to do with Machine Learning. In this chapter, we are going to take a look at the following −
-
Creating Dataset
-
Adding Data to Dataset
我们将首先学习如何创建数据集并使用给定的输入测试数据集。
We will first learn how to create a Dataset and test the dataset with the input given.
Creating Dataset
要创建数据集,我们需要使用 PyBrain 数据集包: pybrain.datasets 。
To create a dataset we need to use the pybrain dataset package: pybrain.datasets.
PyBrain 支持 SupervisedDataset 、SequentialDataset、ClassificationDataSet 等数据集类。我们将使用 SupervisedDataset 来创建数据集。要使用的数据集取决于用户尝试实现的机器学习任务。SupervisedDataset 是最简单的,我们将在本文中使用它。
Pybrain supports datasets classes like SupervisedDataset, SequentialDataset, ClassificationDataSet. We are going to make use of SupervisedDataset , to create our dataset.The dataset to be used depends on the machine learning task that user is trying to implement.SupervisedDataset is the simplest one and we are going to use the same over here.
SupervisedDataset dataset 需要参数 input 和 target。考虑一个 XOR 真值表,如下所示 −
A SupervisedDataset dataset needs params input and target. Consider an XOR truth table, as shown below −
A |
B |
A XOR B |
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
给定的输入像一个二维数组,我们得到一个输出。所以这里的输入变成大小,目标是输出 1。因此,输入数据集的大小将是 2、1。
The inputs that are given are like a 2-dimensional array and we get 1 output. So here the input becomes the size and the target it the output which is 1. So the inputs that will go for our dataset will 2,1.
createdataset.py
from pybrain.datasets import SupervisedDataSet
sds = SupervisedDataSet(2, 1)
print(sds)
这是当我们执行上面的代码 python createdataset.py 时得到的结果 −
This is what we get when we execute above code python createdataset.py −
C:\pybrain\pybrain\src>python createdataset.py
input: dim(0, 2)
[]
target: dim(0, 1)
[]
如下图所示,它显示大小为 2 的输入和大小为 1 的目标。
It displays the input of size 2 and target of size 1 as shown above.
Adding Data to Dataset
现在,我们向数据集添加示例数据。
Let us now add the sample data to the dataset.
createdataset.py
from pybrain.datasets import SupervisedDataSet
sds = SupervisedDataSet(2, 1)
xorModel = [
[(0,0), (0,)],
[(0,1), (1,)],
[(1,0), (1,)],
[(1,1), (0,)],
]
for input, target in xorModel:
sds.addSample(input, target)
print("Input is:")
print(sds['input'])
print("\nTarget is:")
print(sds['target'])
我们创建了一个 XORModel 数组,如下所示 −
We have created a XORModel array as shown below −
xorModel = [
[(0,0), (0,)],
[(0,1), (1,)],
[(1,0), (1,)],
[(1,1), (0,)],
]
要向数据集添加数据,我们使用 addSample() 方法,它输入 input 和 target。
To add data to the dataset, we are using addSample() method which takes in input and target.
以下所示,我们将遍历 xorModel 数组以将数据添加到 addSample 中 −
To add data to the addSample, we will loop through xorModel array as shown below −
for input, target in xorModel:
sds.addSample(input, target)
执行后,我们将获得以下输出 −
After executing, the following is the output we get −
python createdataset.py
C:\pybrain\pybrain\src>python createdataset.py
Input is:
[[0. 0.]
[0. 1.]
[1. 0.]
[1. 1.]]
Target is:
[[0.]
[1.]
[1.]
[0.]]
您可以使用输入和目标索引轻松获取由数据集创建的输入和目标详细信息,如下所示 −
You can get the input and target details from the dataset created by simply using the input and target index as shown below −
print(sds['input'])
print(sds[‘target’])