Artificial Intelligence With Python 简明教程
AI with Python – Reinforcement Learning
在本章中,您将详细了解 AI 中强化学习的概念以及 Python 中的实现。
In this chapter, you will learn in detail about the concepts reinforcement learning in AI with Python.
Basics of Reinforcement Learning
这种类型的学习用于根据批评信息强化或增强网络。也就是说,在强化学习下接受训练的网络从环境中收到一些反馈。但是,与监督学习的情况不同,反馈是评估而不是指导性的。基于此反馈,网络会对权重进行调整,以便在将来获得更好的批评信息。
This type of learning is used to reinforce or strengthen the network based on critic information. That is, a network being trained under reinforcement learning, receives some feedback from the environment. However, the feedback is evaluative and not instructive as in the case of supervised learning. Based on this feedback, the network performs the adjustments of the weights to obtain better critic information in future.
这种学习过程类似于监督学习,但我们可能的信息很欠缺。下图给出了强化学习的框图 −
This learning process is similar to supervised learning but we might have very less information. The following figure gives the block diagram of reinforcement learning −
Building Blocks: Environment and Agent
环境和智能体是 AI 中强化学习的主要构建模块。本节将详细讨论它们 −
Environment and Agent are main building blocks of reinforcement learning in AI. This section discusses them in detail −
Agent
代理是可以通过传感器感知其环境并通过效应器对环境采取行动的任何事物。
An agent is anything that can perceive its environment through sensors and acts upon that environment through effectors.
-
A human agent has sensory organs such as eyes, ears, nose, tongue and skin parallel to the sensors, and other organs such as hands, legs, mouth, for effectors.
-
A robotic agent replaces cameras and infrared range finders for the sensors, and various motors and actuators for effectors.
-
A software agent has encoded bit strings as its programs and actions.
Agent Terminology
以下术语在 AI 中的强化学习中使用得更频繁 −
The following terms are more frequently used in reinforcement learning in AI −
-
Performance Measure of Agent − It is the criteria, which determines how successful an agent is.
-
Behavior of Agent − It is the action that agent performs after any given sequence of percepts.
-
Percept − It is agent’s perceptual inputs at a given instance.
-
Percept Sequence − It is the history of all that an agent has perceived till date.
-
Agent Function − It is a map from the precept sequence to an action.
Environment
有些程序完全在 artificial environment 中运行,仅限于键盘输入、数据库、计算机文件系统和屏幕上的字符输出。
Some programs operate in an entirely artificial environment confined to keyboard input, database, computer file systems and character output on a screen.
相比之下,一些软件代理(如软件机器人或软机器人)存在于丰富且无限的软机器人域中。模拟器具有 very detailed 和 complex environment 。软件代理需要实时从一系列操作中进行选择。
In contrast, some software agents, such as software robots or softbots, exist in rich and unlimited softbot domains. The simulator has a very detailed, and complex environment. The software agent needs to choose from a long array of actions in real time.
例如,旨在扫描客户的在线偏好并向客户展示有趣商品的软机器人,既可以在 real 中工作,也可以在 artificial 环境中工作。
For example, a softbot designed to scan the online preferences of the customer and display interesting items to the customer works in the real as well as an artificial environment.
Properties of Environment
该环境具有如下所述的多重属性 −
The environment has multifold properties as discussed below −
-
Discrete/Continuous − If there are a limited number of distinct, clearly defined, states of the environment, the environment is discrete , otherwise it is continuous. For example, chess is a discrete environment and driving is a continuous environment.
-
Observable/Partially Observable − If it is possible to determine the complete state of the environment at each time point from the percepts, it is observable; otherwise it is only partially observable.
-
Static/Dynamic − If the environment does not change while an agent is acting, then it is static; otherwise it is dynamic.
-
Single agent/Multiple agents − The environment may contain other agents which may be of the same or different kind as that of the agent.
-
Accessible/Inaccessible − If the agent’s sensory apparatus can have access to the complete state of the environment, then the environment is accessible to that agent; otherwise it is inaccessible.
-
Deterministic/Non-deterministic − If the next state of the environment is completely determined by the current state and the actions of the agent, then the environment is deterministic; otherwise it is non-deterministic.
-
Episodic/Non-episodic − In an episodic environment, each episode consists of the agent perceiving and then acting. The quality of its action depends just on the episode itself. Subsequent episodes do not depend on the actions in the previous episodes. Episodic environments are much simpler because the agent does not need to think ahead.
Constructing an Environment with Python
为了构建强化学习代理,我们将使用 OpenAI Gym 包,可以通过以下命令进行安装 −
For building reinforcement learning agent, we will be using the OpenAI Gym package which can be installed with the help of the following command −
pip install gym
OpenAI 健身房中有各种环境可以用于各种目的。其中一些是 Cartpole-v0, Hopper-v1 和 MsPacman-v0 。它们需要不同的引擎。
There are various environments in OpenAI gym which can be used for various purposes. Few of them are Cartpole-v0, Hopper-v1, and MsPacman-v0. They require different engines.
以下代码显示了 cartpole-v0 环境的 Python 代码示例 −
The following code shows an example of Python code for cartpole-v0 environment −
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample())
您可以以类似的方式构建其他环境。
You can construct other environments in a similar way.
Constructing a learning agent with Python
为了构建强化学习代理,我们将像下面那样使用 OpenAI Gym 包 −
For building reinforcement learning agent, we will be using the OpenAI Gym package as shown −
import gym
env = gym.make('CartPole-v0')
for _ in range(20):
observation = env.reset()
for i in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(i+1))
break
请注意,平衡杆可以自行平衡。
Observe that the cartpole can balance itself.