Apache Mxnet 简明教程
Apache MXNet - Python Packages
在本章中,我们将了解 Apache MXNet 中可用的 Python 包。
In this chapter we will learn about the Python Packages available in Apache MXNet.
Important MXNet Python packages
MXNet 具有以下重要的 Python 包,我们将会逐个讨论 −
MXNet has the following important Python packages which we will be discussing one by one −
-
Autograd (Automatic Differentiation)
-
NDArray
-
KVStore
-
Gluon
-
Visualization
首先我们从 Apache MXNet 的 Autograd Python 包开始。
First let us start with Autograd Python package for Apache MXNet.
Autograd
Autograd 表示 automatic differentiation 用于将梯度从损失指标反向传播到每个参数。它在反向传播时使用动态规划方法来有效计算梯度。它也被称为反向模式自动微分化。这种技术在很多参数影响单个损失指标的「扇入」情况下非常有效。
Autograd stands for automatic differentiation used to backpropagate the gradients from the loss metric back to each of the parameters. Along with backpropagation it uses a dynamic programming approach to efficiently calculate the gradients. It is also called reverse mode automatic differentiation. This technique is very efficient in ‘fan-in’ situations where, many parameters effect a single loss metric.
What are gradients?
梯度是神经网络训练过程中的基础。它们基本上告诉我们如何改变网络参数以提高其性能。
Gradients are the fundamentals to the process of neural network training. They basically tell us how to change the parameters of the network to improve its performance.
众所周知,神经网络 (NN) 由加法、乘积、卷积等运算符组成。这些运算符在计算中会使用参数,例如卷积核中的权重。我们必须为这些参数找到最佳值,而梯度则向我们展示了方法并引导我们找到解决方案。
As we know that, neural networks (NN) are composed of operators such as sums, product, convolutions, etc. These operators, for their computations, use parameters such as the weights in convolution kernels. We should have to find the optimal values for these parameters and gradients shows us the way and lead us to the solution as well.

我们关注的是改变参数对网络性能的影响,梯度会告诉我们,在某个变量依赖于某个变量时,当我们改变该变量时,该变量会增加或减少多少。性能通常使用我们尝试最小化的损失指标来定义。例如,对于回归,我们可能尝试最小化我们的预测与精确值之间的 L2 损失,而对于分类,我们可能最小化 cross-entropy loss 。
We are interested in the effect of changing a parameter on performance of the network and gradients tell us, how much a given variable increases or decreases when we change a variable it depends on. The performance is usually defined by using a loss metric that we try to minimise. For example, for regression we might try to minimise L2 loss between our predictions and exact value, whereas for classification we might minimise the cross-entropy loss.
一旦我们按照损失计算了每个参数的梯度,就可以使用优化器,例如随机梯度下降。
Once we calculate the gradient of each parameter with reference to the loss, we can then use an optimiser, such as stochastic gradient descent.
How to calculate gradients?
我们有以下选项来计算梯度 −
We have the following options to calculate gradients −
-
Symbolic Differentiation − The very first option is Symbolic Differentiation, which calculates the formulas for each gradient. The drawback of this method is that, it will quickly lead to incredibly long formulas as the network get deeper and operators get more complex.
-
Finite Differencing − Another option is, to use finite differencing which try slight differences on each parameter and see how the loss metric responds. The drawback of this method is that, it would be computationally expensive and may have poor numerical precision.
-
Automatic differentiation − The solution to the drawbacks of the above methods is, to use automatic differentiation to backpropagate the gradients from the loss metric back to each of the parameters. Propagation allows us a dynamic programming approach to efficiently calculate the gradients. This method is also called reverse mode automatic differentiation.
Automatic Differentiation (autograd)
在这里,我们将详细了解 autograd 的工作原理。它基本上在以下两个阶段进行工作 −
Here, we will understand in detail the working of autograd. It basically works in following two stages −
Stage 1 − 该阶段称为训练的 ‘Forward Pass’ 。顾名思义,在此阶段它创建了网络用于预测和计算损失指标所使用运算符的记录。
Stage 1 − This stage is called ‘Forward Pass’ of training. As name implies, in this stage it creates the record of the operator used by the network to make predictions and calculate the loss metric.
Stage 2 − 该阶段称为训练的 ‘Backward Pass’ 。顾名思义,在此阶段它通过该记录向后进行工作。向后进行时,它会评估每个运算符的偏导数,一直到网络参数。
Stage 2 − This stage is called ‘Backward Pass’ of training. As name implies, in this stage it works backwards through this record. Going backwards, it evaluates the partial derivatives of each operator, all the way back to the network parameter.

Advantages of autograd
以下是使用自动微分化 (autograd) 的优势 −
Following are the advantages of using Automatic Differentiation (autograd) −
-
Flexible − Flexibility, that it gives us when defining our network, is one of the huge benefits of using autograd. We can change the operations on every iteration. These are called the dynamic graphs, which are much more complex to implement in frameworks requiring static graph. Autograd, even in such cases, will still be able to backpropagate the gradients correctly.
-
Automatic − Autograd is automatic, i.e. the complexities of the backpropagation procedure are taken care of by it for you. We just need to specify what gradients we are interested in calculating.
-
Efficient − Autogard calculates the gradients very efficiently.
-
Can use native Python control flow operators − We can use the native Python control flow operators such as if condition and while loop. The autograd will still be able to backpropagate the gradients efficiently and correctly.
Using autograd in MXNet Gluon
这里,借助示例,我们将了解如何在 MXNet Gluon 中使用 autograd 。
Here, with the help of an example, we will see how we can use autograd in MXNet Gluon.
Implementation Example
在以下示例中,我们将实现一个有两个层回归模型。实现后,我们将使用自动梯度计算损失函数关于各个权重参数的梯度 −
In the following example, we will implement the regression model having two layers. After implementing, we will use autograd to automatically calculate the gradient of the loss with reference to each of the weight parameters −
首先,如下导入自动梯度和其他必需的包 −
First import the autogrard and other required packages as follows −
from mxnet import autograd
import mxnet as mx
from mxnet.gluon.nn import HybridSequential, Dense
from mxnet.gluon.loss import L2Loss
现在,我们需要如下定义网络 −
Now, we need to define the network as follows −
N_net = HybridSequential()
N_net.add(Dense(units=3))
N_net.add(Dense(units=1))
N_net.initialize()
现在我们需要如下定义损失 −
Now we need to define the loss as follows −
loss_function = L2Loss()
接下来,我们需要如下创建虚拟数据 −
Next, we need to create the dummy data as follows −
x = mx.nd.array([[0.5, 0.9]])
y = mx.nd.array([[1.5]])
现在,我们已准备好进行首次前向网络遍历。我们希望自动梯度记录计算图,以便我们可以计算梯度。为此,我们需要在 autograd.record 上下文的范围内运行网络代码,如下所示 −
Now, we are ready for our first forward pass through the network. We want autograd to record the computational graph so that we can calculate the gradients. For this, we need to run the network code in the scope of autograd.record context as follows −
with autograd.record():
y_hat = N_net(x)
loss = loss_function(y_hat, y)
现在,我们准备进行反向遍历,首先对目标量调用反向方法。在我们示例中,目标量是损失,因为我们正尝试计算损失关于参数的梯度 −
Now, we are ready for the backward pass, which we start by calling the backward method on the quantity of interest. The quatity of interest in our example is loss because we are trying to calculate the gradient of loss with reference to the parameters −
loss.backward()
现在,对于网络的每个参数,我们都有梯度,优化器将使用这些梯度来更新参数值,以提升性能。让我们检查第一层的梯度,如下所示 −
Now, we have gradients for each parameter of the network, which will be used by the optimiser to update the parameter value for improved performance. Let’s check out the gradients of the 1st layer as follows −
N_net[0].weight.grad()
Output
输出如下 −
The output is as follows−
[[-0.00470527 -0.00846948]
[-0.03640365 -0.06552657]
[ 0.00800354 0.01440637]]
<NDArray 3x2 @cpu(0)>
Complete implementation example
完整实现示例如下:
Given below is the complete implementation example.
from mxnet import autograd
import mxnet as mx
from mxnet.gluon.nn import HybridSequential, Dense
from mxnet.gluon.loss import L2Loss
N_net = HybridSequential()
N_net.add(Dense(units=3))
N_net.add(Dense(units=1))
N_net.initialize()
loss_function = L2Loss()
x = mx.nd.array([[0.5, 0.9]])
y = mx.nd.array([[1.5]])
with autograd.record():
y_hat = N_net(x)
loss = loss_function(y_hat, y)
loss.backward()
N_net[0].weight.grad()