Python Deep Learning 简明教程

Computational Graphs

Tensorflow、Torch、Theano 等深度学习框架通过使用计算图来实现反向传播。更重要的是,对计算图中的反向传播的理解融合了几种不同的算法及其变体,例如时间反向传播和具有共享权重的反向传播。一旦所有内容都转换为计算图,它们仍是相同的算法 - 只不过是在计算图上进行反向传播。

Backpropagation is implemented in deep learning frameworks like Tensorflow, Torch, Theano, etc., by using computational graphs. More significantly, understanding back propagation on computational graphs combines several different algorithms and its variations such as backprop through time and backprop with shared weights. Once everything is converted into a computational graph, they are still the same algorithm − just back propagation on computational graphs.

What is Computational Graph

计算图定义为一个有向图,其中节点对应于数学运算。计算图是表达和求值数学表达式的途径。

A computational graph is defined as a directed graph where the nodes correspond to mathematical operations. Computational graphs are a way of expressing and evaluating a mathematical expression.

例如,下面是一个简单的数学方程式−

For example, here is a simple mathematical equation −

p = x+y

我们可以绘制上述方程的计算图,如下所示。

We can draw a computational graph of the above equation as follows.

computational graph equation1

上述计算图有一个加法节点(带有“+”符号的节点),有两个输入变量 x 和 y 及一个输出 q。

The above computational graph has an addition node (node with "+" sign) with two input variables x and y and one output q.

让我们再举一个稍复杂的例子。我们有以下方程式。

Let us take another example, slightly more complex. We have the following equation.

g = \left (x+y \right ) \ast z

g = \left (x+y \right ) \ast z

上述方程由以下计算图表示。

The above equation is represented by the following computational graph.

computational graph equation2

Computational Graphs and Backpropagation

计算图和反向传播都是深度学习中训练神经网络的重要核心概念。

Computational graphs and backpropagation, both are important core concepts in deep learning for training neural networks.

Forward Pass

前向传播是对计算图所表示数学表达式的值进行求值。执行前向传播表示我们从左(输入)到右(输出)将值沿着前向方向从变量中传递。

Forward pass is the procedure for evaluating the value of the mathematical expression represented by computational graphs. Doing forward pass means we are passing the value from variables in forward direction from the left (input) to the right where the output is.

让我们考虑一个示例,给所有输入赋予一些值。假设赋予所有输入以下值。

Let us consider an example by giving some value to all of the inputs. Suppose, the following values are given to all of the inputs.

x=1, y=3, z=−3

通过给输入赋予这些值,我们可以执行前向传播,并获得每个节点上输出的下列值。

By giving these values to the inputs, we can perform forward pass and get the following values for the outputs on each node.

首先,我们使用 x = 1 和 y = 3 的值来获得 p = 4。

First, we use the value of x = 1 and y = 3, to get p = 4.

forward pass

然后,我们使用 p = 4 和 z = -3 来获得 g = -12。我们从左至右前进。

Then we use p = 4 and z = -3 to get g = -12. We go from left to right, forwards.

forward pass equation

Objectives of Backward Pass

在反向传播中,我们的目的是针对最终输出计算每个输入的梯度。这些梯度对于使用梯度下降法训练神经网络至关重要。

In the backward pass, our intention is to compute the gradients for each input with respect to the final output. These gradients are essential for training the neural network using gradient descent.

例如,我们期望以下梯度。

For example, we desire the following gradients.

Desired gradients

\frac{\partial x}{\partial f}, \frac{\partial y}{\partial f}, \frac{\partial z}{\partial f}

Backward pass (backpropagation)

我们通过找到最终输出相对于最终输出(它本身!)的导数来启动反向传播。因此,它将得同一性导数,且该值等于一。

We start the backward pass by finding the derivative of the final output with respect to the final output (itself!). Thus, it will result in the identity derivation and the value is equal to one.

\frac{\partial g}{\partial g} = 1

我们的计算图现在如下所示 -

Our computational graph now looks as shown below −

backward pass

接下来,我们将执行“*”操作的向后传递。我们将计算 p 和 z 处的梯度。由于 g = p*z,我们知道 -

Next, we will do the backward pass through the "*" operation. We will calculate the gradients at p and z. Since g = p*z, we know that −

\frac{\partial g}{\partial z} = p

\frac{\partial g}{\partial p} = z

我们已经从正向传递知道了 z 和 p 的值。因此,我们得到 -

We already know the values of z and p from the forward pass. Hence, we get −

\frac{\partial g}{\partial z} = p = 4

and

\frac{\partial g}{\partial p} = z = -3

我们想要计算 x 和 y 处的梯度 -

We want to calculate the gradients at x and y −

\frac{\partial g}{\partial x}, \frac{\partial g}{\partial y}

但是,我们想要高效地执行此操作(虽然 x 和 g 在此图中只有 2 跳的距离,但设想它们彼此相距甚远)。为了高效计算这些值,我们将使用微分的链式法则。根据链式法则,我们有 -

However, we want to do this efficiently (although x and g are only two hops away in this graph, imagine them being really far from each other). To calculate these values efficiently, we will use the chain rule of differentiation. From chain rule, we have −

\frac{\partial g}{\partial x}=\frac{\partial g}{\partial p}\ast \frac{\partial p}{\partial x}

\frac{\partial g}{\partial y}=\frac{\partial g}{\partial p}\ast \frac{\partial p}{\partial y}

但我们已经知道 dg/dp=-3,dp/dx 和 dp/dy 很容易,因为 p 直接取决于 x 和 y。我们有 -

But we already know the dg/dp = -3, dp/dx and dp/dy are easy since p directly depends on x and y. We have −

p=x+y\Rightarrow \frac{\partial x}{\partial p} = 1, \frac{\partial y}{\partial p} = 1

因此,我们得到 -

Hence, we get −

\frac{\partial g} {\partial f} = \frac{\partial g} {\partial p}\ast \frac{\partial p} {\partial x} = \left ( -3 \right ).1 = -3

此外,对于输入 y -

In addition, for the input y −

\frac{\partial g} {\partial y} = \frac{\partial g} {\partial p}\ast \frac{\partial p} {\partial y} = \left ( -3 \right ).1 = -3

这样反向进行的主要原因是,当我们必须计算 x 处梯度时,我们仅使用已计算的值以及 dq/dx(节点输出相对于同一节点输入的导数)。我们使用局部信息来计算一个全局值。

The main reason for doing this backwards is that when we had to calculate the gradient at x, we only used already computed values, and dq/dx (derivative of node output with respect to the same node’s input). We used local information to compute a global value.

Steps for training a neural network

按照以下步骤训练神经网络:

Follow these steps to train a neural network −

  1. For data point x in dataset,we do forward pass with x as input, and calculate the cost c as output.

  2. We do backward pass starting at c, and calculate gradients for all nodes in the graph. This includes nodes that represent the neural network weights.

  3. We then update the weights by doing W = W - learning rate * gradients.

  4. We repeat this process until stop criteria is met.