Pybrain 简明教程

PyBrain - Importing Data For Datasets

在本章中,我们将学习如何获取使用 Pybrain 数据集的数据。

In this chapter, we will learn how to get data to work with Pybrain datasets.

最常用的数据集是:

The most commonly used are datasets are −

  1. Using sklearn

  2. From CSV file

Using sklearn

使用 sklearn

Using sklearn

以下是 sklearn 数据集详细信息的链接: https://scikit-learn.org/stable/datasets/toy_dataset.html

Here is the link that has details of datasets from sklearn:https://scikit-learn.org/stable/datasets/toy_dataset.html

以下是使用 sklearn 数据集的一些示例:

Here are a few examples of how to use datasets from sklearn −

Example 1: load_digits()

from sklearn import datasets
from pybrain.datasets import ClassificationDataSet
digits = datasets.load_digits()
X, y = digits.data, digits.target
ds = ClassificationDataSet(64, 1, nb_classes=10)
for i in range(len(X)):
ds.addSample(ravel(X[i]), y[i])

Example 2: load_iris()

from sklearn import datasets
from pybrain.datasets import ClassificationDataSet
digits = datasets.load_iris()
X, y = digits.data, digits.target
ds = ClassificationDataSet(4, 1, nb_classes=3)
for i in range(len(X)):
ds.addSample(X[i], y[i])

From CSV file

我们还可以通过以下方式使用 csv 文件中的数据:

We can also use data from csv file as follows −

下面是异或真值表的样本数据:datasettest.csv

Here is sample data for xor truth table: datasettest.csv

csv file

以下是读取 csv 文件中数据以获取数据集的工作示例。

Here is the working example to read the data from .csv file for dataset.

Example

from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import TanhLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
import pandas as pd

print('Read data...')
df = pd.read_csv('data/datasettest.csv',header=0).head(1000)
data = df.values

train_output = data[:,0]
train_data = data[:,1:]

print(train_output)
print(train_data)

# Create a network with two inputs, three hidden, and one output
nn = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)

# Create a dataset that matches network input and output sizes:
_gate = SupervisedDataSet(2, 1)

# Create a dataset to be used for testing.
nortrain = SupervisedDataSet(2, 1)

# Add input and target values to dataset
# Values for NOR truth table
for i in range(0, len(train_output)) :
   _gate.addSample(train_data[i], train_output[i])

#Training the network with dataset norgate.
trainer = BackpropTrainer(nn, _gate)

# will run the loop 1000 times to train it.
for epoch in range(1000):
   trainer.train()
trainer.testOnData(dataset=_gate, verbose = True)

如示例所示,Panda 用于读取 csv 文件中的数据。

Panda is used to read data from csv file as shown in the example.

Output

C:\pybrain\pybrain\src>python testcsv.py
Read data...
[0 1 1 0]
[
   [0 0]
   [0 1]
   [1 0]
   [1 1]
]
Testing on data:
('out: ', '[0.004 ]')
('correct:', '[0 ]')
error: 0.00000795
('out: ', '[0.997 ]')
('correct:', '[1 ]')
error: 0.00000380
('out: ', '[0.996 ]')
('correct:', '[1 ]')
error: 0.00000826
('out: ', '[0.004 ]')
('correct:', '[0 ]')
error: 0.00000829
('All errors:', [7.94733477723902e-06, 3.798267582566822e-06, 8.260969076585322e
-06, 8.286246525558165e-06])
('Average error:', 7.073204490487332e-06)
('Max error:', 8.286246525558165e-06, 'Median error:', 8.260969076585322e-06)