Apache Mxnet 简明教程

Python API Autograd and Initializer

本章介绍了 MXNet 中的自动微分和初始化器 API。

This chapter deals with the autograd and initializer API in MXNet.


这是 MXNet 对 NDArray 的自动微分 API。它具有以下类 -

This is MXNet’ autograd API for NDArray. It has the following class −

Class: Function()

它用于自动微分中的自定义微分。它可以写为 [s2]。如果由于任何原因,用户不希望使用默认链式法则计算的梯度,那么他/她可以使用 mxnet.autograd 的 Function 类自定义微分的计算。它有两个方法,即 Forward() 和 Backward()。

It is used for customised differentiation in autograd. It can be written as mxnet.autograd.Function. If, for any reason, the user do not want to use the gradients that are computed by the default chain-rule, then he/she can use Function class of mxnet.autograd to customize differentiation for computation. It has two methods namely Forward() and Backward().

让我们借助以下要点来了解此类的作用 -

Let us understand the working of this class with the help of following points −

  1. First, we need to define our computation in the forward method.

  2. Then, we need to provide the customized differentiation in the backward method.

  3. Now during gradient computation, instead of user-defined backward function, mxnet.autograd will use the backward function defined by the user. We can also cast to numpy array and back for some operations in forward as well as backward.


在使用 mxnet.autograd.function 类之前,让我们定义一个稳定的 sigmoid 函数及其反向和正向方法,如下所示 -

Before using the mxnet.autograd.function class, let’s define a stable sigmoid function with backward as well as forward methods as follows −

class sigmoid(mx.autograd.Function):
   def forward(self, x):
      y = 1 / (1 + mx.nd.exp(-x))
      return y

   def backward(self, dy):
      y, = self.saved_tensors
      return dy * y * (1-y)

现在,function 类可以用作以下 -

Now, the function class can be used as follows −

func = sigmoid()
x = mx.nd.random.uniform(shape=(10,))
with mx.autograd.record():
m = func(x)
dx_grad = x.grad.asnumpy()


运行代码后,你将看到以下输出 −

When you run the code, you will see the following output −

array([0.21458015, 0.21291625, 0.23330082, 0.2361367 , 0.23086983,
0.24060014, 0.20326573, 0.21093895, 0.24968489, 0.24301809],

Methods and their parameters

mxnet.autogard.function 类的以下方法和参数 -

Following are the methods and their parameters of mxnet.autogard.function class −

Methods and its Parameters


forward (heads[, head_grads, retain_graph, …])

This method is used for forward computation.

backward(heads[, head_grads, retain_graph, …])

This method is used for backward computation. It computes the gradients of heads with respect to previously marked variables. This method takes as many inputs as forward’s output. It also returns as many NDArray’s as forward’s inputs.


This method is used to retrieve recorded computation history as Symbol.

grad(heads, variables[, head_grads, …])

This method computes the gradients of heads with respect to variables. Once computed, instead of storing into variable.grad, gradients will be returned as new NDArrays.


With the help of this method we can get status on recording and not recording.


With the help of this method we can get status on training and predicting.

mark_variables(variables, gradients[, grad_reqs])

This method will mark NDArrays as variables to compute gradient for autograd. This method is same as function .attach_grad() in a variable but the only difference is that with this call we can set the gradient to any value.


This method returns a scope context to be used in ‘with’ statement for codes which do not need gradients to be calculated.


This method returns a scope context to be used in ‘with’ statement in which forward pass behavior is set to inference mode and that is without changing the recording states.


It will return an autograd recording scope context to be used in ‘with’ statement and captures code which needs gradients to be calculated.


Similar to is_recoring(), with the help of this method we can get status on recording and not recording.


Similar to is_traininig(), with the help of this method we can set status to training or predicting.


This method will return a scope context to be used in ‘with’ statement in which forward pass behavior is set to training mode and that is without changing the recording states.

Implementation Example

在以下示例中,我们将使用 mxnet.autograd.grad() 方法来计算目标相对于变量的梯度 −

In the below example, we will be using mxnet.autograd.grad() method to compute the gradient of head with respect to variables −

x = mx.nd.ones((2,))
with mx.autograd.record():
z = mx.nd.elemwise_add(mx.nd.exp(x), x)
dx_grad = mx.autograd.grad(z, [x], create_graph=True)



The output is mentioned below −

[3.7182817 3.7182817]
<NDArray 2 @cpu(0)>]

我们可以使用 mxnet.autograd.predict_mode() 方法来返回一个范围用于“with”语句 −

We can use mxnet.autograd.predict_mode() method to return a scope to be used in ‘with’ statement −

with mx.autograd.record():
y = model(x)
with mx.autograd.predict_mode():
y = sampling(y)


这是 MXNet 的 API 用于权重初始化器。它具有以下类 −

This is MXNet’ API for weigh initializer. It has the following classes −

Classes and their parameters

以下为 mxnet.autogard.function 类的方法和其参数:

Following are the methods and their parameters of mxnet.autogard.function class:

Classes and its Parameters



With the help of this class we can initialize weight for up-sampling layers.


This class initializes the weights to a given value. The value can be a scalar as well as NDArray that matches the shape of the parameter to be set.

FusedRNN(init, num_hidden, num_layers, mode)

As name implies, this class initialize parameters for the fused Recurrent Neural Network (RNN) layers.


It acts as the descriptor for the initialization pattern.


This is the base class of an initializer.


This class initialize all biases of an LSTMCell to 0.0 but except for the forget gate whose bias is set to a custom value.

Load(param[, default_init, verbose])

This class initialize the variables by loading data from file or dictionary.

MSRAPrelu([factor_type, slope])

As name implies, this class Initialize the weight according to a MSRA paper.

Mixed(patterns, initializers)

It initializes the parameters using multiple initializers.


Normal() class initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation (SD) of sigma.


It initializes the weights of parameter to one.

Orthogonal([scale, rand_type])

As name implies, this class initialize weight as orthogonal matrix.


It initializes weights with random values which is uniformly sampled from a given range.

Xavier([rnd_type, factor_type, magnitude])

It actually returns an initializer that performs “Xavier” initialization for weights.


It initializes the weights of parameter to zero.

Implementation Example

在以下示例中,我们将使用 mxnet.init.Normal() 类创建初始化器并获取其参数 −

In the below example, we will be using mxnet.init.Normal() class create an initializer and retrieve its parameters −

init = mx.init.Normal(0.8)


输出如下 −

The output is given below −

'["normal", {"sigma": 0.8}]'


init = mx.init.Xavier(factor_type="in", magnitude=2.45)



The output is shown below −

'["xavier", {"rnd_type": "uniform", "factor_type": "in", "magnitude": 2.45}]'

在以下示例中,我们将使用 mxnet.initializer.Mixed() 类使用多个初始化器来初始化参数 −

In the below example, we will be using mxnet.initializer.Mixed() class to initialize parameters using multiple initializers −

init = mx.initializer.Mixed(['bias', '.*'], [mx.init.Zero(),

for dictionary in module.get_params():
for key in dictionary:



The output is shown below −

[[ 0.0097627 0.01856892 0.04303787]]
[ 0.]