Pybrain 简明教程

PyBrain - Overview

Pybrain 是一个使用 Python 实现的机器学习开源库。这个库为您提供了一些易于使用的训练算法,用于训练和测试网络的数据集和训练员。

Pybrain is an open-source library for Machine learning implemented using python. The library offers you some easy to use training algorithms for networks, datasets, trainers to train and test the network.

Pybrain 官方文档对其的定义如下 −

Definition of Pybrain as put by its official documentation is as follows −

PyBrain 是一个模块化的 Python 机器学习库。它的目标是为机器学习任务提供强大、易于使用且灵活的算法,以及各种预定义的环境来测试和比较您的算法。

PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.

PyBrain 是基于 Python 的强化学习、人工智能和神经网络库的简写。事实上,我们首先想出了这个名字,然后对这个非常具有描述性的“反义词”进行了逆向工程。

PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence, and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive "Backronym".

Features of Pybrain

以下是 Pybrain 的功能 −

The following are the features of Pybrain −

Networks

一个网络由模块组成,并且它们使用连接进行连接。Pybrain 支持神经网络,如前馈网络、循环网络等。

A network is composed of modules and they are connected using connections. Pybrain supports neural networks like Feed-Forward Network, Recurrent Network, etc.

feed-forward network 是神经网络,其中节点之间的信息向前移动,并且永远不会向后传播。前馈网络是人工神经网络中可用网络中第一个也是最简单的网络。

feed-forward network is a neural network, where the information between nodes moves in the forward direction and will never travel backward. Feed Forward network is the first and the simplest one among the networks available in the artificial neural network.

信息从输入节点传递到下一个隐藏节点,之后传递到输出节点。

The information is passed from the input nodes, next to the hidden nodes and later to the output node.

Recurrent Networks 类似于前馈神经网络;唯一的区别是它必须在每个步骤中记住数据。必须保存每个步骤的历史记录。

Recurrent Networks are similar to Feed Forward Network; the only difference is that it has to remember the data at each step. The history of each step has to be saved.

Datasets

数据集是提供给测试、验证和训练网络的数据。要使用的数据集类型取决于我们要使用机器学习完成的任务。Pybrain 支持的最常使用的数据集是 SupervisedDataSetClassificationDataSet

Datasets is the data to be given to test, validate and train on networks. The type of dataset to be used depends on the tasks that we are going to do with Machine Learning. The most commonly used datasets that Pybrain supports are SupervisedDataSet and ClassificationDataSet.

SupervisedDataSet − 它由 inputtarget 组成。它是最简单的 dataset 形式,主要用于监督学习任务。

SupervisedDataSet − It consists of fields of input and target. It is the simplest form of a dataset and mainly used for supervised learning tasks.

ClassificationDataSet − 它主要用于解决分类问题。它接受 inputtarget 域,还接受一个名为“class”的额外域,它是所给目标的自动备份。例如,输出要么是 1 或 0,要么是基于给定输入的值将输出分组,也就是说,它会属于一个特定类。

ClassificationDataSet − It is mainly used to deal with classification problems. It takes in input, target field and also an extra field called "class" which is an automated backup of the targets given. For example, the output will be either 1 or 0 or the output will be grouped together with values based on input given, i.e., either it will fall in one particular class.

Trainer

当我们创建一个网络,即神经网络时,它会根据提供给它的训练数据进行训练。现在,网络是否已正确训练将取决于对该网络上测试的测试数据的预测。Pybrain Training 中最重要的概念是 BackpropTrainer 和 TrainUntilConvergence 的使用。

When we create a network, i.e., neural network, it will get trained based on the training data given to it. Now whether the network is trained properly or not will depend on the prediction of test data tested on that network. The most important concept in Pybrain Training is the use of BackpropTrainer and TrainUntilConvergence.

BackpropTrainer − 它是一个训练器,它通过反向传播误差(通过时间)根据监督或 ClassificationDataSet dataset(潜在序列)训练模块的参数。

BackpropTrainer − It is a trainer that trains the parameters of a module according to a supervised or ClassificationDataSet dataset (potentially sequential) by backpropagating the errors (through time).

TrainUntilConvergence − 用于训练数据模块,直至其收敛。

TrainUntilConvergence −It is used to train the module on the dataset until it converges.

Tools

Pybrain 提供工具模块,可以通过导入包来帮助构建网络: pybrain.tools.shortcuts.buildNetwork

Pybrain offers tools modules which can help to build a network by importing package: pybrain.tools.shortcuts.buildNetwork

Visualization

测试数据无法使用 Pybrain 可视化。但是,Pybrain 可以与其他框架(如 Mathplotlib、pyplot)配合使用来可视化数据。

The testing data cannot be visualized using pybrain. But Pybrain can work with other frameworks like Mathplotlib, pyplot to visualize the data.

Advantages of Pybrain

Pybrain 的优点包括:

The advantages of Pybrain are −

  1. Pybrain is an open-source free library to learn Machine Learning. It is a good start for any newcomer interested in Machine Learning.

  2. Pybrain uses python to implement it and that makes it fast in development in comparison to languages like Java/C++.

  3. Pybrain works easily with other libraries of python to visualize data.

  4. Pybrain offers support for popular networks like Feed-Forward Network, Recurrent Networks, Neural Networks, etc.

  5. Working with .csv to load datasets is very easy in Pybrain. It also allows using datasets from another library.

  6. Training and testing of data are easy using Pybrain trainers.

Limitations of Pybrain

Pybrain 对遇到的任何问题提供的帮助更少。在 stackoverflowGoogle Group 上有一些问题没有得到解答。

Pybrain offers less help for any issues faced. There are some queries unanswered on stackoverflow and on Google Group.

Workflow of Pybrain

根据 Pybrain 文档,机器学习的流程如下图所示:

As per Pybrain documentation the flow of machine learning is shown in the following figure −

workflow of pybrain

在开始时,我们有原始数据,经过预处理后,可以使用 Pybrain。

At the start, we have raw data which after preprocessing can be used with Pybrain.

Pybrain 的流程从 dataset 开始,dataset 分为训练数据和测试数据。

The flow of Pybrain starts with datasets which are divided into trained and test data.

  1. the network is created, and the dataset and the network are given to the trainer.

  2. the trainer trains the data on the network and classifies the outputs as trained error and validation error which can be visualized.

  3. the tested data can be validated to see if the output matches the trained data.

Terminology

在使用 Pybrain 进行机器学习时,有一些重要的术语需要注意。它们如下所示 −

There are important terms to be considered while working with Pybrain for machine learning. They are as follows −

Total Error − 它指的是网络训练后显示的误差。如果误差在每次迭代时都在不断变化,则表示它仍需要时间来平稳,直到它开始在迭代之间显示恒定的误差。一旦它开始显示恒定的误差数字,则表示网络已收敛,无论是否应用任何其他训练,它都将保持不变。

Total Error − It refers to the error shown after the network is trained. If the error keeps changing on every iteration, it means it still needs time to settle, until it starts showing a constant error between iteration. Once it starts showing the constant error numbers, it means that the network has converged and will remain the same irrespective of any additional training is applied.

Trained data − 它是用于训练 Pybrain 网络的数据。

Trained data − It is the data used to train the Pybrain network.

Testing data − 它是用于测试经过训练的 Pybrain 网络的数据。

Testing data − It is the data used to test the trained Pybrain network.

Trainer − 当我们创建一个网络,即神经网络时,它将根据给定的训练数据进行训练。现在,网络是否正确训练取决于在该网络上测试的测试数据预测。Pybrain 训练中最重要的概念是使用 BackpropTrainer 和 TrainUntilConvergence。

Trainer − When we create a network, i.e., neural network, it will get trained based on the training data given to it. Now whether the network is trained properly or not will depend on the prediction of test data tested on that network. The most important concept in Pybrain Training is the use of BackpropTrainer and TrainUntilConvergence.

BackpropTrainer − 它是一个训练器,它通过反向传播误差(通过时间)根据监督或 ClassificationDataSet dataset(潜在序列)训练模块的参数。

BackpropTrainer − It is a trainer that trains the parameters of a module according to a supervised or ClassificationDataSet dataset (potentially sequential) by backpropagating the errors (through time).

TrainUntilConvergence − 它用于训练模块上的数据集,直到它收敛。

TrainUntilConvergence − It is used to train the module on the dataset until it converges.

Layers − 层基本上是用于网络隐含层的函数集。

Layers − Layers are basically a set of functions that are used on hidden layers of a network.

Connections − 连接的工作原理类似于层;唯一的区别是它将数据从网络中的一个节点转移到另一个节点。

Connections − A connection works similar to a layer; an only difference is that it shifts the data from one node to the other in a network.

Modules − 模块是包含输入和输出缓冲区的网络。

Modules − Modules are networks which consists of input and output buffer.

Supervised Learning − 在这种情况下,我们有输入和输出,并且我们可以利用算法将输入与输出进行映射。该算法旨在学习给定的训练数据并在其上进行迭代,当算法预测正确数据时,迭代过程停止。

Supervised Learning − In this case, we have an input and output, and we can make use of an algorithm to map the input with the output. The algorithm is made to learn on the training data given and iterated on it and the process of iteration stops when the algorithm predicts the correct data.

Unsupervised − 在这种情况下,我们有输入,但不知道输出。无监督学习的作用是根据给定的数据尽可能多地进行训练。

Unsupervised − In this case, we have input but don’t know the output. The role of unsupervised learning is to get trained as much as possible with the data given.