Theano 简明教程

Theano - Quick Guide

Theano - Introduction

你在 Python 中开发过机器学习模型吗?然后,你显然知道开发这些模型的复杂性。开发通常是一个缓慢的过程,需要几个小时和几天的计算能力。

Have you developed Machine Learning models in Python? Then, obviously you know the intricacies in developing these models. The development is typically a slow process taking hours and days of computational power.

机器学习模型开发需要大量的数学计算。这些通常需要算术计算,特别是多维大矩阵。这些天来,我们使用神经网络而不是传统的统计技术来开发机器学习应用程序。神经网络需要在大量数据上进行训练。分批次训练合理大小的数据。因此,学习过程是迭代的。因此,如果计算没有有效完成,则训练网络可能需要几个小时甚至几天。因此,对可执行代码进行优化是非常需要的。而这正是 Theano 所提供的。

The Machine Learning model development requires lot of mathematical computations. These generally require arithmetic computations especially large matrices of multiple dimensions. These days we use Neural Networks rather than the traditional statistical techniques for developing Machine Learning applications. The Neural Networks need to be trained over a huge amount of data. The training is done in batches of data of reasonable size. Thus, the learning process is iterative. Thus, if the computations are not done efficiently, training the network can take several hours or even days. Thus, the optimization of the executable code is highly desired. And that is what exactly Theano provides.

Theano 是一个 Python 库,它能让你定义机器学习中使用的数学表达式,优化这些表达式,并在关键领域精确地使用 GPU 来有效地评估它们。在大多数情况下,它可以与典型的全 C 实现相媲美。

Theano is a Python library that lets you define mathematical expressions used in Machine Learning, optimize these expressions and evaluate those very efficiently by decisively using GPUs in critical areas. It can rival typical full C-implementations in most of the cases.

Theano 是在 LISA 实验室编写的,目的是提供高效机器学习算法的快速开发。它是在 BSD 许可证下发布的。

Theano was written at the LISA lab with the intention of providing rapid development of efficient machine learning algorithms. It is released under a BSD license.

在本教程中,你将学习如何使用 Theano 库。

In this tutorial, you will learn to use Theano library.

Theano - Installation

Theano 可以安装在 Windows、MacOS 和 Linux 上。在所有情况下,安装都很简单。在安装 Theano 之前,您必须安装其依赖项。以下是依赖项列表:

Theano can be installed on Windows, MacOS, and Linux. The installation in all the cases is trivial. Before you install Theano, you must install its dependencies. The following is the list of dependencies −

  1. Python

  2. NumPy − Required

  3. SciPy − Required only for Sparse Matrix and special functions

  4. BLAS − Provides standard building blocks for performing basic vector and matrix operations

您可以根据需要选择安装的可选软件包:

The optional packages that you may choose to install depending on your needs are −

  1. nose: To run Theano’s test-suite

  2. Sphinx − For building documentation

  3. Graphiz and pydot − To handle graphics and images

  4. NVIDIA CUDA drivers − Required for GPU code generation/execution

  5. libgpuarray − Required for GPU/CPU code generation on CUDA and OpenCL devices

我们将在 MacOS 中讨论安装 Theano 的步骤。

We shall discuss the steps to install Theano in MacOS.

MacOS Installation

要安装 Theano 及其依赖项,请您在命令行中使用 pip ,如下所示。这是本教程中我们将要用到的最小依赖项。

To install Theano and its dependencies, you use pip from the command line as follows. These are the minimal dependencies that we are going to need in this tutorial.

$ pip install Theano
$ pip install numpy
$ pip install scipy
$ pip install pydot

您还需要使用以下命令安装 OSx 命令行开发人员工具:

You also need to install OSx command line developer tool using the following command −

$ xcode-select --install

将会看到以下屏幕。点击 Install 按钮来安装工具。

You will see the following screen. Click on the Install button to install the tool.

install button

在成功安装后,将会在控制台看到成功信息。

On successful installation, you will see the success message on the console.

Testing the Installation

在 Anaconda Jupyter 中成功完成安装后,打开一个新笔记本。在代码单元格中,输入以下 Python 脚本:

After the installation completes successfully, open a new notebook in the Anaconda Jupyter. In the code cell, enter the following Python script −

Example

import theano
from theano import tensor
a = tensor.dscalar()
b = tensor.dscalar()
c = a + b
f = theano.function([a,b], c)
d = f(1.5, 2.5)
print (d)

Output

执行脚本,你应该看到以下输出:

Execute the script and you should see the following output −

4.0

执行屏幕快照在你快速参考时显示在下方:

The screenshot of the execution is shown below for your quick reference −

testing the installation

如果你获取了上面的输出,你的 Theano 安装就成功了。如果没有,请按照 Theano 下载页面上的调试说明来修复问题。

If you get the above output, your Theano installation is successful. If not, follow the debug instructions on Theano download page to fix the issues.

What is Theano?

现在你已经成功安装 Theano,我们首先来了解一下什么是 Theano?Theano 是一个 Python 库。它能让你定义、优化和评估数学表达式,特别是应用于机器学习模型开发的表达式。Theano 本身不包含任何预定义的 ML 模型;它只帮助其开发。它在处理多维数组时特别有用。它与 NumPy 无缝集成,而 NumPy 是 Python 中用于科学计算的基本且广泛使用的软件包。

Now that you have successfully installed Theano, let us first try to understand what is Theano? Theano is a Python library. It lets you define, optimize, and evaluate mathematical expressions, especially the ones which are used in Machine Learning Model development. Theano itself does not contain any pre-defined ML models; it just facilitates its development. It is especially useful while dealing with multi-dimensional arrays. It seamlessly integrates with NumPy, which is a fundamental and widely used package for scientific computations in Python.

Theano 用于定义机器学习开发中使用的数学表达式。此类表达式通常涉及矩阵运算、微分、梯度计算等。

Theano facilitates defining mathematical expressions used in ML development. Such expressions generally involve Matrix Arithmetic, Differentiation, Gradient Computation, and so on.

Theano 首先为模型构建整个计算图。然后通过对图应用多个优化技术将其编译为高效代码。编译后的代码由 Theano 中 function 可用的特殊操作注入到 Theano 运行时。我们重复执行 function 以训练神经网络。与使用纯 Python 编码甚至完整的 C 实现相比,训练时间大大减少。

Theano first builds the entire Computational Graph for your model. It then compiles it into highly efficient code by applying several optimization techniques on the graph. The compiled code is injected into Theano runtime by a special operation called function available in Theano. We execute this function repetitively to train a neural network. The training time is substantially reduced as compared to using pure Python coding or even a full C implementation.

现在我们将了解 Theano 开发过程。让我们从如何在 Theano 中定义数学表达式开始。

We shall now understand the process of Theano development. Let us begin with how to define a mathematical expression in Theano.

Theano - A Trivial Theano Expression

让我们从在 Theano 中定义和评估一个平凡表达式开始我们的 Theano 之旅。考虑以下添加两个标量的平凡表达式 −

Let us begin our journey of Theano by defining and evaluating a trivial expression in Theano. Consider the following trivial expression that adds two scalars −

c = a + b

c = a + b

其中 ab 是变量, c 是表达式输出。在 Theano 中,即使是定义和评估此平凡表达式也很棘手。

Where a, b are variables and c is the expression output. In Theano, defining and evaluating even this trivial expression is tricky.

让我们了解评估上述表达式的步骤。

Let us understand the steps to evaluate the above expression.

Importing Theano

首先,我们需要在程序中导入 Theano 库,我们使用以下语句进行导入 −

First, we need to import Theano library in our program, which we do using the following statement −

from theano import *

与导入单个包不同的是,我们在上面的语句中使用了 * 来包含 Theano 库中的所有包。

Rather than importing the individual packages, we have used * in the above statement to include all packages from the Theano library.

Declaring Variables

接下来,我们将使用以下语句声明一个名为 a 的变量 −

Next, we will declare a variable called a using the following statement −

a = tensor.dscalar()

dscalar 方法声明一个十进制标量变量。在程序代码中执行上述语句将在程序代码中创建一个名为 a 的变量。同样,我们将使用以下语句创建变量 b

The dscalar method declares a decimal scalar variable. The execution of the above statement creates a variable called a in your program code. Likewise, we will create variable b using the following statement −

b = tensor.dscalar()

Defining Expression

接下来,我们将定义作用于这两个变量 ab 上的表达式。

Next, we will define our expression that operates on these two variables a and b.

c = a + b

c = a + b

在 Theano 中,上述语句的执行不会执行两个变量 ab 的标量相加。

In Theano, the execution of the above statement does not perform the scalar addition of the two variables a and b.

Defining Theano Function

要评估上述表达式,我们需要在 Theano 中定义一个函数,如下所示 −

To evaluate the above expression, we need to define a function in Theano as follows −

f = theano.function([a,b], c)

函数 function 接受两个参数,第一个参数是函数的输入,第二个参数是函数的输出。上述声明说明第一个参数是包含两个元素 ab 的数组类型。输出是一个名为 c 的标量单元。在后续代码中,将使用变量名 f 引用此函数。

The function function takes two arguments, the first argument is an input to the function and the second one is its output. The above declaration states that the first argument is of type array consisting of two elements a and b. The output is a scalar unit called c. This function will be referenced with the variable name f in our further code.

Invoking Theano Function

对函数 f 的调用使用以下语句 −

The call to the function f is made using the following statement −

d = f(3.5, 5.5)

d = f(3.5, 5.5)

函数的输入是一个包含两个标量 3.55.5 的数组。执行的输出被分配给标量变量 d 。要打印 d 的内容,我们将使用 print 语句 −

The input to the function is an array consisting of two scalars: 3.5 and 5.5. The output of execution is assigned to the scalar variable d. To print the contents of d, we will use the print statement −

print (d)

执行将导致 d 的值打印在控制台上,在本例中为 9.0。

The execution would cause the value of d to be printed on the console, which is 9.0 in this case.

Full Program Listing

在此提供完整的程序清单供你快速参考 −

The complete program listing is given here for your quick reference −

from theano import *
a = tensor.dscalar()
b = tensor.dscalar()
c = a + b
f = theano.function([a,b], c)
d = f(3.5, 5.5)
print (d)

执行上述代码,你将看到输出为 9.0。这里显示屏幕截图 −

Execute the above code and you will see the output as 9.0. The screen shot is shown here −

full program

现在,让我们讨论一个计算两个矩阵相乘的稍微复杂的示例。

Now, let us discuss a slightly more complex example that computes the multiplication of two matrices.

Theano - Expression for Matrix Multiplication

我们将计算两个矩阵的点积。第一个矩阵的维度是 2 x 3,第二个矩阵的维度是 3 x 2。我们用作输入的矩阵及其乘积在此处表示为:

We will compute a dot product of two matrices. The first matrix is of dimension 2 x 3 and the second one is of dimension 3 x 2. The matrices that we used as input and their product are expressed here −

Declaring Variables

为上述内容编写 Theano 表达式,我们首先声明两个变量来表示我们的矩阵,如下所示:

To write a Theano expression for the above, we first declare two variables to represent our matrices as follows −

a = tensor.dmatrix()
b = tensor.dmatrix()

dmatrix 是双精度矩阵类型。请注意,我们没有在任何地方指定矩阵大小。因此,这些变量可以表示任何维度的矩阵。

The dmatrix is the Type of matrices for doubles. Note that we do not specify the matrix size anywhere. Thus, these variables can represent matrices of any dimension.

Defining Expression

为了计算点积,我们使用了名为 dot 的内置函数,如下所示:

To compute the dot product, we used the built-in function called dot as follows −

c = tensor.dot(a,b)

乘法的输出将分配给名为 c 的矩阵变量。

The output of multiplication is assigned to a matrix variable called c.

Defining Theano Function

接下来,我们定义一个方法,就像早期示例中定义的方法一样,来评估表达式。

Next, we define a function as in the earlier example to evaluate the expression.

f = theano.function([a,b], c)

请注意,方法的输入是两个矩阵类型的变量 a 和 b。方法输出分配给变量 c ,该变量将自动成为矩阵类型。

Note that the input to the function are two variables a and b which are of matrix type. The function output is assigned to variable c which would automatically be of matrix type.

Invoking Theano Function

我们现在使用以下语句调用方法 −

We now invoke the function using the following statement −

d = f([[0, -1, 2], [4, 11, 2]], [[3, -1],[1,2], [6,1]])

上述语句中的两个变量是 NumPy 数组。您可以像下面这样显式定义 NumPy 数组 −

The two variables in the above statement are NumPy arrays. You may explicitly define NumPy arrays as shown here −

f(numpy.array([[0, -1, 2], [4, 11, 2]]),
numpy.array([[3, -1],[1,2], [6,1]]))

在计算出 d 之后,我们打印其值 −

After d is computed we print its value −

print (d)

您将在输出上看到以下输出 −

You will see the following output on the output −

[[11. 0.]
[25. 20.]]

Full Program Listing

The complete program listing is given here:
from theano import *
a = tensor.dmatrix()
b = tensor.dmatrix()
c = tensor.dot(a,b)
f = theano.function([a,b], c)
d = f([[0, -1, 2],[4, 11, 2]], [[3, -1],[1,2],[6,1]])
print (d)

这里显示了程序执行的屏幕截图 −

The screenshot of the program execution is shown here −

program execution

Theano - Computational Graph

从上述两个示例中,您可能已经注意到,在 Theano 中,我们创建一个表达式,该表达式最终使用 Theano function 进行评估。Theano 使用高级优化技术来优化表达式的执行。为了可视化计算图,Theano 在其库中提供了 printing 软件包。

From the above two examples, you may have noticed that in Theano we create an expression which is eventually evaluated using the Theano function. Theano uses advanced optimization techniques to optimize the execution of an expression. To visualize the computation graph, Theano provides a printing package in its library.

Symbolic Graph for Scalar Addition

若要查看标量加法程序的计算图,请使用打印库如下:

To see the computation graph for our scalar addition program, use the printing library as follows −

theano.printing.pydotprint(f, outfile="scalar_addition.png", var_with_name_simple=True)

执行此语句后,将在您的机器上创建一个名为 scalar_addition.png 的文件。保存的计算图在此处显示,供您快速参考:

When you execute this statement, a file called scalar_addition.png will be created on your machine. The saved computation graph is displayed here for your quick reference −

scalar addition

以下给出生成上述图像的完整程序清单:

The complete program listing to generate the above image is given below −

from theano import *
a = tensor.dscalar()
b = tensor.dscalar()
c = a + b
f = theano.function([a,b], c)
theano.printing.pydotprint(f, outfile="scalar_addition.png", var_with_name_simple=True)

Symbolic Graph for Matrix Multiplier

现在,尝试为我们的矩阵乘法器创建计算图。生成此图的完整清单如下:

Now, try creating the computation graph for our matrix multiplier. The complete listing for generating this graph is given below −

from theano import *
a = tensor.dmatrix()
b = tensor.dmatrix()
c = tensor.dot(a,b)
f = theano.function([a,b], c)
theano.printing.pydotprint(f, outfile="matrix_dot_product.png", var_with_name_simple=True)

生成的图形如下所示:

The generated graph is shown here −

matrix multiplier

Complex Graphs

在较大表达式中,计算图形可能非常复杂。此处显示了从 Theano 文档中获取的此类图形:

In larger expressions, the computational graphs could be very complex. One such graph taken from Theano documentation is shown here −

complex graphs

为了理解 Theano 的工作原理,首先了解这些计算图的重要性非常重要。有了这种理解,我们就会知道 Theano 的重要性。

To understand the working of Theano, it is important to first know the significance of these computational graphs. With this understanding, we shall know the importance of Theano.

Why Theano?

通过查看计算图形的复杂性,您现在将能够理解开发 Theano 背后的目的。典型的编译器会在程序中提供本地优化,因为它从不会将整个计算视为一个整体。

By looking at the complexity of the computational graphs, you will now be able to understand the purpose behind developing Theano. A typical compiler would provide local optimizations in the program as it never looks at the entire computation as a single unit.

Theano 实施非常高级的优化技术来优化完整的计算图。它将代数方面与优化编译器方面结合在一起。图形的一部分可以编译成 C 语言代码。对于重复计算,评估速度至关重要,Theano 通过生成非常有效的代码来满足此目的。

Theano implements very advanced optimization techniques to optimize the full computational graph. It combines the aspects of Algebra with aspects of an optimizing compiler. A part of the graph may be compiled into C-language code. For repeated calculations, the evaluation speed is critical and Theano meets this purpose by generating a very efficient code.

Theano - Data Types

现在,你已经了解了 Theano 的基础知识,让我们开始了解可用于创建表达式的不同数据类型。下表为你提供了 Theano 中定义的数据类型的部分列表。

Now, that you have understood the basics of Theano, let us begin with the different data types available to you for creating your expressions. The following table gives you a partial list of data types defined in Theano.

Data type

Theano type

Byte

bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4, btensor5, btensor6, btensor7

16-bit integers

wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4, wtensor5, wtensor6, wtensor7

32-bit integers

iscalar, ivector, imatrix, irow, icol, itensor3, itensor4, itensor5, itensor6, itensor7

64-bit integers

lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4, ltensor5, ltensor6, ltensor7

float

fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4, ftensor5, ftensor6, ftensor7

double

dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4, dtensor5, dtensor6, dtensor7

complex

cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4, ctensor5, ctensor6, ctensor7

以上列表不会穷尽,读者可参考张量创建文档获取完整列表。

The above list is not exhaustive and the reader is referred to the tensor creation document for a complete list.

接下来,我将提供一些在 Theano 中创建不同类型数据变量的示例。

I will now give you a few examples of how to create variables of various kinds of data in Theano.

Scalar

要构造标量变量,可以使用语法 −

To construct a scalar variable you would use the syntax −

Syntax

x = theano.tensor.scalar ('x')
x = 5.0
print (x)

Output

5.0

One-dimensional Array

要创建一个一维数组,请使用以下声明 −

To create a one dimensional array, use the following declaration −

Example

f = theano.tensor.vector
f = (2.0, 5.0, 3.0)
print (f)f = theano.tensor.vector
f = (2.0, 5.0, 3.0)
print (f)
print (f[0])
print (f[2])

Output

(2.0, 5.0, 3.0)
2.0
3.0

如果你执行 f[3] ,它会生成一个超出范围错误,如下所示 −

If you do f[3] it would generate an index out of range error as shown here −

print f([3])

Output

IndexError                          Traceback (most recent call last)
<ipython-input-13-2a9c2a643c3a> in <module>
   4 print (f[0])
   5 print (f[2])
----> 6 print (f[3])
IndexError: tuple index out of range

Two-dimensional Array

要声明一个二维数组,可以使用以下代码片段 −

To declare a two-dimensional array you would use the following code snippet −

Example

m = theano.tensor.matrix
m = ([2,3], [4,5], [2,4])
print (m[0])
print (m[1][0])

Output

[2, 3]
4

5-dimensional Array

要声明一个五维数组,请使用以下语法 −

To declare a 5-dimensional array, use the following syntax −

Example

m5 = theano.tensor.tensor5
m5 = ([0,1,2,3,4], [5,6,7,8,9], [10,11,12,13,14])
print (m5[1])
print (m5[2][3])

Output

[5, 6, 7, 8, 9]
13

你可以使用 tensor5 数据类型代替 tensor3 声明一个三维数组,使用 tensor4 数据类型声明一个四维数组,依此类推,最多到 tensor7

You may declare a 3-dimensional array by using the data type tensor3 in place of tensor5, a 4-dimensional array using the data type tensor4, and so on up to tensor7.

Plural Constructors

有时,你可能希望在单一声明中创建相同类型的变量。你可以使用以下语法执行此操作 −

Sometimes, you may want to create variables of the same type in a single declaration. You can do so by using the following syntax −

Syntax

from theano.tensor import * x, y, z = dmatrices('x', 'y', 'z')
x = ([1,2],[3,4],[5,6])
y = ([7,8],[9,10],[11,12])
z = ([13,14],[15,16],[17,18])
print (x[2])
print (y[1])
print (z[0])

Output

[5, 6]
[9, 10]
[13, 14]

Theano - Variables

在前面的章节中,在讨论数据类型时,我们创建并使用了 Theano 变量。重申一下,我们使用以下语法在 Theano 中创建一个变量 −

In the previous chapter, while discussing the data types, we created and used Theano variables. To reiterate, we would use the following syntax to create a variable in Theano −

x = theano.tensor.fvector('x')

在此语句中,我们创建了一个 x 类型的变量vector,其中包含 32 位浮点数。我们还将它命名为 x 。名称通常有助于调试。

In this statement, we have created a variable x of type vector containing 32-bit floats. We are also naming it as x. The names are generally useful for debugging.

要声明一个由 32 位整数组成的向量,可以使用以下语法 −

To declare a vector of 32-bit integers, you would use the following syntax −

i32 = theano.tensor.ivector

在此,我们没有为变量指定名称。

Here, we do not specify a name for the variable.

要声明一个由 64 位浮点数组成的三维向量,可以使用以下声明 −

To declare a three-dimensional vector consisting of 64-bit floats, you would use the following declaration −

f64 = theano.tensor.dtensor3

不同类型的构造函数及其数据类型列在下面的表中 −

The various types of constructors along with their data types are listed in the table below −

Constructor

Data type

Dimensions

fvector

float32

1

ivector

int32

1

fscalar

float32

0

fmatrix

float32

2

ftensor3

float32

3

dtensor3

float64

3

您可以使用通用的向量构造函数,并通过以下方式明确指定数据类型 −

You may use a generic vector constructor and specify the data type explicitly as follows −

x = theano.tensor.vector ('x', dtype=int32)

在下一章中,我们将了解如何创建共享变量。

In the next chapter, we will learn how to create shared variables.

Theano - Shared Variables

很多时候,您需要创建在不同方法之间以及在对同一方法的多次调用之间共享的变量。举个例子,在神经网络训练期间,您会创建权重向量,用于为每个正在考虑的特征分配权重。在网络训练期间的每次迭代中都会修改此向量。因此,它必须在对同一方法的多次调用期间全局可访问。因此,我们为此创建一个共享变量。通常,Theano 将此类共享变量移动到 GPU(如果可用)。这会加快计算速度。

Many a times, you would need to create variables which are shared between different functions and also between multiple calls to the same function. To cite an example, while training a neural network you create weights vector for assigning a weight to each feature under consideration. This vector is modified on every iteration during the network training. Thus, it has to be globally accessible across the multiple calls to the same function. So we create a shared variable for this purpose. Typically, Theano moves such shared variables to the GPU, provided one is available. This speeds up the computation.

Syntax

您使用以下语法创建一个共享变量 −

You create a shared variable you use the following syntax −

import numpy
W = theano.shared(numpy.asarray([0.1, 0.25, 0.15, 0.3]), 'W')

Example

这里创建了一个包含四个浮点数的 NumPy 数组。要设置/获取 W 值,您将使用以下代码片段 −

Here the NumPy array consisting of four floating point numbers is created. To set/get the W value you would use the following code snippet −

import numpy
W = theano.shared(numpy.asarray([0.1, 0.25, 0.15, 0.3]), 'W')
print ("Original: ", W.get_value())
print ("Setting new values (0.5, 0.2, 0.4, 0.2)")
W.set_value([0.5, 0.2, 0.4, 0.2])
print ("After modifications:", W.get_value())

Output

Original: [0.1 0.25 0.15 0.3 ]
Setting new values (0.5, 0.2, 0.4, 0.2)
After modifications: [0.5 0.2 0.4 0.2]

Theano - Functions

Theano function 充当与符号图进行交互的挂钩。符号图编译成高效率的执行代码。它通过重新构建数学方程来实现此目的,从而使它们更快。它将表达式的某些部分编译成 C 语言代码。它将某些张量移动到 GPU,依此类推。

Theano function acts like a hook for interacting with the symbolic graph. A symbolic graph is compiled into a highly efficient execution code. It achieves this by restructuring mathematical equations to make them faster. It compiles some parts of the expression into C language code. It moves some tensors to the GPU, and so on.

现在将高效的编译代码作为输入提供给 Theano function 。当您执行 Theano function 时,它将计算结果分配给由我们指定的可变变量。优化的类型可以指定为 FAST_COMPILE 或 FAST_RUN。这在环境变量 THEANO_FLAGS 中指定。

The efficient compiled code is now given as an input to the Theano function. When you execute the Theano function, it assigns the result of computation to the variables specified by us. The type of optimization may be specified as FAST_COMPILE or FAST_RUN. This is specified in the environment variable THEANO_FLAGS.

Theano function 使用以下语法声明 −

A Theano function is declared using the following syntax −

f = theano.function ([x], y)

第一个参数 [x] 是输入变量的列表,第二个参数 y 是输出变量的列表。

The first parameter [x] is the list of input variables and the second parameter y is the list of output variables.

现在已经了解 Theano 的基础知识,让我们从一个简单的示例开始 Theano 编码。

Having now understood the basics of Theano, let us begin Theano coding with a trivial example.

Theano - Trivial Training Example

Theano 在训练神经网络方面非常有用,在训练神经网络时,我们必须反复计算成本和梯度以实现最优。在大数据集上,这在计算上会很密集。由于 Theano 对我们前面看到的计算图进行了内部优化,因此它可以有效地执行此操作。

Theano is quite useful in training neural networks where we have to repeatedly calculate cost, and gradients to achieve an optimum. On large datasets, this becomes computationally intensive. Theano does this efficiently due to its internal optimizations of the computational graph that we have seen earlier.

Problem Statement

接下来,我们将学习如何使用 Theano 库训练网络。我们将采用一个简单的案例,从一个四特征数据集开始。我们在对每个特征应用一定权重(重要性)后计算这些特征的总和。

We shall now learn how to use Theano library to train a network. We will take a simple case where we start with a four feature dataset. We compute the sum of these features after applying a certain weight (importance) to each feature.

训练的目标是修改分配给每个特征的权重,以便总和达到目标值 100。

The goal of the training is to modify the weights assigned to each feature so that the sum reaches a target value of 100.

sum = f1 * w1 + f2 * w2 + f3 * w3 + f4 * w4

其中 f1f2 、……是特征值, w1w2 、……是权重。

Where f1, f2, …​ are the feature values and w1, w2, …​ are the weights.

让我对示例进行量化以更好地理解问题陈述。我们将为每个特征假定一个初始值 1.0,我们将 w1 等于 0.1w2 等于 0.25w3 等于 0.15w4 等于 0.3 。分配权重值的逻辑并不确定,这只是我们的直觉。因此,初始总和如下 −

Let me quantize the example for a better understanding of the problem statement. We will assume an initial value of 1.0 for each feature and we will take w1 equals 0.1, w2 equals 0.25, w3 equals 0.15, and w4 equals 0.3. There is no definite logic in assigning the weight values, it is just our intuition. Thus, the initial sum is as follows −

sum = 1.0 * 0.1 + 1.0 * 0.25 + 1.0 * 0.15 + 1.0 * 0.3

总和为 0.8 。现在,我们将继续修改权重分配,以便此总和接近 100。 0.8 的当前结果值远未达到我们期望的目标值 100。在机器学习术语中,我们将 cost 定义为目标值与当前输出值之间的差,通常平方以增大误差。我们通过计算梯度和更新权重向量在每次迭代中减少此成本。

Which sums to 0.8. Now, we will keep modifying the weight assignment so that this sum approaches 100. The current resultant value of 0.8 is far away from our desired target value of 100. In Machine Learning terms, we define cost as the difference between the target value minus the current output value, typically squared to blow up the error. We reduce this cost in each iteration by calculating the gradients and updating our weights vector.

让我们看看如何在 Theano 中实现这个整个逻辑。

Let us see how this entire logic is implemented in Theano.

Declaring Variables

我们首先声明我们的输入向量 x 如下 −

We first declare our input vector x as follows −

x = tensor.fvector('x')

其中 x 是浮点值的单维数组。

Where x is a single dimensional array of float values.

我们定义标量 target 变量,如下所示 −

We define a scalar target variable as given below −

target = tensor.fscalar('target')

接下来,我们使用上面讨论的初始值创建一个权重张量 W

Next, we create a weights tensor W with the initial values as discussed above −

W = theano.shared(numpy.asarray([0.1, 0.25, 0.15, 0.3]), 'W')

Defining Theano Expression

我们现在使用以下表达式计算输出 −

We now calculate the output using the following expression −

y = (x * W).sum()

注意在上述声明中, xW 是矢量,而不是简单的标量变量。我们现在使用以下表达式计算误差(成本):

Note that in the above statement x and W are the vectors and not simple scalar variables. We now calculate the error (cost) with the following expression −

cost = tensor.sqr(target - y)

成本是目标值与当前输出值之间的差值的平方。

The cost is the difference between the target value and the current output, squared.

为了计算梯度,它告诉我们距离目标有多远,我们使用内置 grad 方法,如下所示:

To calculate the gradient which tells us how far we are from the target, we use the built-in grad method as follows −

gradients = tensor.grad(cost, [W])

我们现在通过取 0.1 的学习率来更新 weights 矢量,如下所示:

We now update the weights vector by taking a learning rate of 0.1 as follows −

W_updated = W - (0.1 * gradients[0])

接下来,我们需要使用以上的值更新权重矢量。我们在此声明中执行此操作:

Next, we need to update our weights vector using the above values. We do this in the following statement −

updates = [(W, W_updated)]

Defining/Invoking Theano Function

最后,我们在Theano中定义 function 来计算总和。

Lastly, we define a function in Theano to compute the sum.

f = function([x, target], y, updates=updates)

为了调用上述函数一定次数,我们创建 for 循环,如下所示:

To invoke the above function a certain number of times, we create a for loop as follows −

for i in range(10):
output = f([1.0, 1.0, 1.0, 1.0], 100.0)

正如前面所述,该函数的输入是包含四种特征的初始值的一个矢量 - 我们将 1.0 的值分配给每个特征,没有任何特定原因。您可以分配您选择的不同值,并检查函数是否最终收敛。我们将在每次迭代中打印权重矢量的值和相应的输出。它显示在下面的代码中:

As said earlier, the input to the function is a vector containing the initial values for the four features - we assign the value of 1.0 to each feature without any specific reason. You may assign different values of your choice and check if the function ultimately converges. We will print the values of the weight vector and the corresponding output in each iteration. It is shown in the below code −

print ("iteration: ", i)
print ("Modified Weights: ", W.get_value())
print ("Output: ", output)

Full Program Listing

对于您的快速参考,这里再现了完整的程序清单:

The complete program listing is reproduced here for your quick reference −

from theano import *
import numpy

x = tensor.fvector('x')
target = tensor.fscalar('target')

W = theano.shared(numpy.asarray([0.1, 0.25, 0.15, 0.3]), 'W')
print ("Weights: ", W.get_value())

y = (x * W).sum()
cost = tensor.sqr(target - y)
gradients = tensor.grad(cost, [W])
W_updated = W - (0.1 * gradients[0])
updates = [(W, W_updated)]

f = function([x, target], y, updates=updates)
for i in range(10):
   output = f([1.0, 1.0, 1.0, 1.0], 100.0)
   print ("iteration: ", i)
   print ("Modified Weights: ", W.get_value())
   print ("Output: ", output)

当您运行该程序时,您将看到以下输出:

When you run the program you will see the following output −

Weights: [0.1 0.25 0.15 0.3 ]
iteration: 0
Modified Weights: [19.94 20.09 19.99 20.14]
Output: 0.8
iteration: 1
Modified Weights: [23.908 24.058 23.958 24.108]
Output: 80.16000000000001
iteration: 2
Modified Weights: [24.7016 24.8516 24.7516 24.9016]
Output: 96.03200000000001
iteration: 3
Modified Weights: [24.86032 25.01032 24.91032 25.06032]
Output: 99.2064
iteration: 4
Modified Weights: [24.892064 25.042064 24.942064 25.092064]
Output: 99.84128
iteration: 5
Modified Weights: [24.8984128 25.0484128 24.9484128 25.0984128]
Output: 99.968256
iteration: 6
Modified Weights: [24.89968256 25.04968256 24.94968256 25.09968256]
Output: 99.9936512
iteration: 7
Modified Weights: [24.89993651 25.04993651 24.94993651 25.09993651]
Output: 99.99873024
iteration: 8
Modified Weights: [24.8999873 25.0499873 24.9499873 25.0999873]
Output: 99.99974604799999
iteration: 9
Modified Weights: [24.89999746 25.04999746 24.94999746 25.09999746]
Output: 99.99994920960002

观察到经过四次迭代后,结果是 99.96 ,而经过五次迭代后,结果是 99.99 ,接近我们的目标 100.0

Observe that after four iterations, the output is 99.96 and after five iterations, it is 99.99, which is close to our desired target of 100.0.

根据所需的准确度,您可以安全地得出结论,即该网络在4到5次迭代中经过训练。训练完成后,查找权重矢量,经过5次迭代后,其取以下值:

Depending on the desired accuracy, you may safely conclude that the network is trained in 4 to 5 iterations. After the training completes, look up the weights vector, which after 5 iterations takes the following values −

iteration: 5
Modified Weights: [24.8984128 25.0484128 24.9484128 25.0984128]

你现在可以在你的网络中使用这些值,用于部署该模型。

You may now use these values in your network for deploying the model.

Theano - Conclusion

机器学习模型构建涉及涉及张量的密集且重复的计算。这些需要密集的计算资源。由于常规模拟器可以在局部层面提供优化,因此通常不会生成快速执行的代码。

The Machine Learning model building involves intensive and repetitive computations involving tensors. These require intensive computing resources. As a regular compiler would provide the optimizations at the local level, it does not generally produce a fast execution code.

Theano 首先为整个计算构建一个计算图。由于在编译期间整个计算图像是可以作为一个单一图像提供的,因此在预编译期间可以应用几种优化技术,而这正是 Theano 所做的。它会重构计算图,部分将其转换为 C,将共享变量移动到 GPU,等等,以生成非常快的可执行代码。然后由 Theano function 执行编译的代码,该 function 仅仅充当将编译的代码注入运行时的工具。Theano 已经证明了自己的价值,并且在学术界和工业界都得到广泛接受。

Theano first builds a computational graph for the entire computation. As the whole picture of computation is available as a single image during compilation, several optimization techniques can be applied during pre-compilation and that’s what exactly Theano does. It restructures the computational graph, partly converts it into C, moves shared variables to GPU, and so on to generate a very fast executable code. The compiled code is then executed by a Theano function which just acts as a hook for injecting the compiled code into the runtime. Theano has proved its credentials and is widely accepted in both academics and industry.