Pybrain 简明教程
PyBrain - Reinforcement Learning Module
强化学习(RL)是机器学习中一个重要部分。强化学习根据来自环境的输入使智能体学习自己的行为。
Reinforcement Learning (RL) is an important part in Machine Learning. Reinforcement learning makes the agent learn its behaviour based on inputs from the environment.
强化过程中相互交互的组件如下:
The components that interact with each other during Reinforcement are as follows −
-
Environment
-
Agent
-
Task
-
Experiment
强化学习的布局如下:
The layout of Reinforcement Learning is given below −

在 RL 中,智能体以迭代方式与环境对话。在每个迭代中,智能体接收具有奖励的观察结果。然后它选择动作并将其发送到环境中。环境在每次迭代中移动到新的状态,并且每次收到的奖励都会被保存。
In RL, the agent talks with the environment in iteration. At each iteration, the agent receives an observation which has the reward. It then chooses the action and sends to the environment. The environment at each iteration moves to a new state and the reward received each time is saved.
RL 智能体的目标是尽可能多地收集奖励。在迭代之间,智能体的表现与表现优良的智能体进行比较,并且表现差异会引起奖励或失败。RL 主要用于问题解决任务,如机器人控制、电梯、电信、游戏等。
The goal of RL agent is to collect as many rewards as possible. In between the iteration the agent’s performance is compared with that of the agent that acts in a good way and the difference in performance gives rise to either reward or failure. RL is basically used in problem solving tasks like robot control, elevator, telecommunications, games etc.
让我们看看如何在 Pybrain 中使用 RL。
Let us take a look at how to work with RL in Pybrain.
我们将处理迷宫 environment ,它将使用 2 维 numpy 数组表示,其中 1 是墙壁,0 是自由场。智能体的责任是在自由场上移动并找到目标点。
We are going to work on maze environment which will be represented using 2 dimensional numpy array where 1 is a wall and 0 is a free field. The agent’s responsibility is to move over the free field and find the goal point.
以下是处理迷宫环境的分步流程。
Here is a step by step flow of working with maze environment.
Step 1
使用以下代码导入我们需要的包:
Import the packages we need with the below code −
from scipy import *
import sys, time
import matplotlib.pyplot as pylab # for visualization we are using mathplotlib
from pybrain.rl.environments.mazes import Maze, MDPMazeTask
from pybrain.rl.learners.valuebased import ActionValueTable
from pybrain.rl.agents import LearningAgent
from pybrain.rl.learners import Q, QLambda, SARSA #@UnusedImport
from pybrain.rl.explorers import BoltzmannExplorer #@UnusedImport
from pybrain.rl.experiments import Experiment
from pybrain.rl.environments import Task
Step 2
使用以下代码创建迷宫环境:
Create the maze environment using the below code −
# create the maze with walls as 1 and 0 is a free field
mazearray = array(
[[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1]]
)
env = Maze(mazearray, (7, 7)) # create the environment, the first parameter is the
maze array and second one is the goal field tuple
Step 3
下一步是创建智能体。
The next step is to create Agent.
智能体在 RL 中起着重要作用。它将使用 getAction() 和 integrateObservation() 方法与迷宫环境进行交互。
Agent plays an important role in RL. It will interact with the maze environment using getAction() and integrateObservation() methods.
代理有一个控制器(负责将状态映射到动作)和一个学习器。
The agent has a controller (which will map the states to actions) and a learner.
PyBrain 中的控制器像一个模块,输入的是状态,输出的是动作。
The controller in PyBrain is like a module, for which the input is states and convert them into actions.
controller = ActionValueTable(81, 4)
controller.initialize(1.)
ActionValueTable 需要 2 个输入,即:状态和动作的数量。标准迷宫环境有 4 个动作:北、南、东、西。
The ActionValueTable needs 2 inputs, i.e., the number of states and actions. The standard maze environment has 4 actions: north, south, east, west.
现在我们要创建一个学习器。我们将使用 SARSA() 学习算法,让学习器与代理一起使用。
Now we will create a learner. We are going to use SARSA() learning algorithm for the learner to be used with the agent.
learner = SARSA()
agent = LearningAgent(controller, learner)
Step 4
这一步是将代理添加到环境中。
This step is adding Agent to Environment.
要将代理连接到环境中,我们需要一个特殊组件,称为任务。 task 的作用是在环境中寻找目标,以及代理如何通过动作获得奖励。
To connect the agent to environment, we need a special component called task. The role of a task is to look for the goal in the environment and how the agent gets rewards for actions.
环境有它自己的任务。我们使用的迷宫环境有 MDPMazeTask 任务。MDP 指的是 “markov decision process” ,意思是代理知道自己在迷宫中的位置。环境将成为任务的参数。
The environment has its own task. The Maze environment that we have used has MDPMazeTask task. MDP stands for “markov decision process” which means, the agent knows its position in the maze. The environment will be a parameter to the task.
task = MDPMazeTask(env)
Step 5
在将代理添加到环境中后,下一步是创建实验。
The next step after adding agent to environment is to create an Experiment.
现在我们需要创建实验,以便任务和代理相互协调。
Now we need to create the experiment, so that we can have the task and the agent co-ordinate with each other.
experiment = Experiment(task, agent)
现在我们将运行 1000 次实验,如下所示:
Now we are going to run the experiment 1000 times as shown below −
for i in range(1000):
experiment.doInteractions(100)
agent.learn()
agent.reset()
当执行以下代码时,环境将在代理和任务之间运行 100 次:
The environment will run for 100 times between the agent and task when the following code gets executed −
experiment.doInteractions(100)
在每次迭代之后,它会将一个新状态返回给任务,由任务决定将哪些信息和奖励传递给代理。我们准备在 for 循环中学习并重新设置代理之后绘制一张新表格。
After each iteration, it gives back a new state to the task which decides what information and reward should be passed to the agent. We are going to plot a new table after learning and resetting the agent inside the for loop.
for i in range(1000):
experiment.doInteractions(100)
agent.learn()
agent.reset()
pylab.pcolor(table.params.reshape(81,4).max(1).reshape(9,9))
pylab.savefig("test.png")
以下是完整代码:
Here is the full code −
Example
maze.py
maze.py
from scipy import *
import sys, time
import matplotlib.pyplot as pylab
from pybrain.rl.environments.mazes import Maze, MDPMazeTask
from pybrain.rl.learners.valuebased import ActionValueTable
from pybrain.rl.agents import LearningAgent
from pybrain.rl.learners import Q, QLambda, SARSA #@UnusedImport
from pybrain.rl.explorers import BoltzmannExplorer #@UnusedImport
from pybrain.rl.experiments import Experiment
from pybrain.rl.environments import Task
# create maze array
mazearray = array(
[[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1]]
)
env = Maze(mazearray, (7, 7))
# create task
task = MDPMazeTask(env)
#controller in PyBrain is like a module, for which the input is states and
convert them into actions.
controller = ActionValueTable(81, 4)
controller.initialize(1.)
# create agent with controller and learner - using SARSA()
learner = SARSA()
# create agent
agent = LearningAgent(controller, learner)
# create experiment
experiment = Experiment(task, agent)
# prepare plotting
pylab.gray()
pylab.ion()
for i in range(1000):
experiment.doInteractions(100)
agent.learn()
agent.reset()
pylab.pcolor(controller.params.reshape(81,4).max(1).reshape(9,9))
pylab.savefig("test.png")