Apache Mxnet 简明教程
Apache MXNet - Gluon
另一个最重要的 MXNet Python 包是 Gluon。在本章中,我们将讨论此包。Gluon 为深度学习项目提供了一个清晰、简洁、简单的 API。它使 Apache MXNet 能够构建原型、构建和训练深度学习模型,同时不会牺牲训练速度。
Another most important MXNet Python package is Gluon. In this chapter, we will be discussing this package. Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed.
Blocks
Blocks 构成了更复杂网络设计的基础。在神经网络中,随着神经网络的复杂性增加,我们需要从设计单个神经元转向整个层。例如,诸如 ResNet-152 等神经网络设计通过包含 blocks 重复层具有非常公平的规律性。
Blocks form the basis of more complex network designs. In a neural network, as the complexity of neural network increases, we need to move from designing single to entire layers of neurons. For example, NN design like ResNet-152 have a very fair degree of regularity by consisting of blocks of repeated layers.
Example
在下面给出的示例中,我们将编写一个简单的块,即多层感知器的块。
In the example given below, we will write code a simple block, namely block for a multilayer perceptron.
from mxnet import nd
from mxnet.gluon import nn
x = nd.random.uniform(shape=(2, 20))
N_net = nn.Sequential()
N_net.add(nn.Dense(256, activation='relu'))
N_net.add(nn.Dense(10))
N_net.initialize()
N_net(x)
Output
这会生成以下输出:
This produces the following output:
[[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038
0.08696645 -0.0190793 -0.04122177 0.05088576]
[ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431
0.06786592 -0.06187843 -0.03436674 0.04234696]]
<NDArray 2x10 @cpu(0)>
从定义层到定义一个或多个层的块所需的步骤 −
Steps needed to go from defining layers to defining blocks of one or more layers −
Step 1 − 块将数据作为输入。
Step 1 − Block take the data as input.
Step 2 − 现在,块将以参数的形式存储状态。例如,在上述编码示例中,块包含两个隐藏层,我们需要一个地方来存储其参数。
Step 2 − Now, blocks will store the state in the form of parameters. For example, in the above coding example the block contains two hidden layers and we need a place to store parameters for it.
Step 3 − 下一个块将调用前向函数来执行前向传播。它也被称为前向计算。作为第一次前向调用的一个部分,块将以惰性方式初始化参数。
Step 3 − Next block will invoke the forward function to perform forward propagation. It is also called forward computation. As a part of first forward call, blocks initialize the parameters in a lazy fashion.
Step 4 − 最后,块将调用反向函数并计算相对于其输入的梯度。通常,此步骤会自动执行。
Step 4 − At last the blocks will invoke backward function and calculate the gradient with reference to their input. Typically, this step is performed automatically.
Sequential Block
顺序块是数据通过一系列块流过其中的特殊类型的块。其中,每个块应用于前一个块的输出,第一块应用于输入数据本身。
A sequential block is a special kind of block in which the data flows through a sequence of blocks. In this, each block applied to the output of one before with the first block being applied on the input data itself.
让我们看看 sequential 类如何工作 −
Let us see how sequential class works −
from mxnet import nd
from mxnet.gluon import nn
class MySequential(nn.Block):
def __init__(self, **kwargs):
super(MySequential, self).__init__(**kwargs)
def add(self, block):
self._children[block.name] = block
def forward(self, x):
for block in self._children.values():
x = block(x)
return x
x = nd.random.uniform(shape=(2, 20))
N_net = MySequential()
N_net.add(nn.Dense(256, activation
='relu'))
N_net.add(nn.Dense(10))
N_net.initialize()
N_net(x)
Output
输出与此一同给出 −
The output is given herewith −
[[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038
0.08696645 -0.0190793 -0.04122177 0.05088576]
[ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431
0.06786592 -0.06187843 -0.03436674 0.04234696]]
<NDArray 2x10 @cpu(0)>
Custom Block
我们可以轻松地通过如上定义的顺序块超越级联。但是,如果我们想要进行自定义,则 Block 类也为我们提供了所需的功能。块类具有由 nn 模块提供的模型构造函数。我们可以继承该模型构造函数来定义我们想要的模型。
We can easily go beyond concatenation with sequential block as defined above. But, if we would like to make customisations then the Block class also provides us the required functionality. Block class has a model constructor provided in nn module. We can inherit that model constructor to define the model we want.
在以下示例中, MLP class 覆盖了块类的 init 和正向函数。
In the following example, the MLP class overrides the init and forward functions of the Block class.
让我们看看它的工作原理。
Let us see how it works.
class MLP(nn.Block):
def __init__(self, **kwargs):
super(MLP, self).__init__(**kwargs)
self.hidden = nn.Dense(256, activation='relu') # Hidden layer
self.output = nn.Dense(10) # Output layer
def forward(self, x):
hidden_out = self.hidden(x)
return self.output(hidden_out)
x = nd.random.uniform(shape=(2, 20))
N_net = MLP()
N_net.initialize()
N_net(x)
Output
当运行代码时,您将看到以下输出:
When you run the code, you will see the following output:
[[ 0.07787763 0.00216403 0.01682201 0.03059879 -0.00702019 0.01668715
0.04822846 0.0039432 -0.09300035 -0.04494302]
[ 0.08891078 -0.00625484 -0.01619131 0.0380718 -0.01451489 0.02006172
0.0303478 0.02463485 -0.07605448 -0.04389168]]
<NDArray 2x10 @cpu(0)>
Custom Layers
Apache MXNet 的 Gluon API 带有少量预定义层。但有时候,我们可能会发现需要一个新层。我们可以在 Gluon API 中轻松添加一个新层。在本节中,我们将看到我们如何从头开始创建一个新层。
Apache MXNet’s Gluon API comes with a modest number of pre-defined layers. But still at some point, we may find that a new layer is needed. We can easily add a new layer in Gluon API. In this section, we will see how we can create a new layer from scratch.
The Simplest Custom Layer
要在 Gluon API 中创建新层,我们必须创建一个从 Block 类继承的类,它提供了最基本的功能。我们可以直接或通过其他子类从它继承所有预定义的层。
To create a new layer in Gluon API, we must have to create a class inherits from the Block class which provides the most basic functionality. We can inherit all the pre-defined layers from it directly or via other subclasses.
要创建新层,只需要实现唯一的实例方法 forward (self, x) 。此方法定义了我们的层将在正向传播期间确切执行什么操作。如前所述,块的反向传播传递将由 Apache MXNet 本身自动完成。
For creating the new layer, the only instance method needed to be implemented is forward (self, x). This method defines, what exactly our layer is going to do during forward propagation. As discussed earlier also, the back-propagation pass for blocks will be done by Apache MXNet itself automatically.
Example
在下面的示例中,我们将定义一个新层。我们还将实现 forward() 方法,通过将输入数据拟合到 [0, 1] 的范围内来标准化输入数据。
In the example below, we will be defining a new layer. We will also implement forward() method to normalise the input data by fitting it into a range of [0, 1].
from __future__ import print_function
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon.nn import Dense
mx.random.seed(1)
class NormalizationLayer(gluon.Block):
def __init__(self):
super(NormalizationLayer, self).__init__()
def forward(self, x):
return (x - nd.min(x)) / (nd.max(x) - nd.min(x))
x = nd.random.uniform(shape=(2, 20))
N_net = NormalizationLayer()
N_net.initialize()
N_net(x)
Output
在执行上述程序后,您将获得以下结果−
On executing the above program, you will get the following result −
[[0.5216355 0.03835821 0.02284337 0.5945146 0.17334817 0.69329053
0.7782702 1. 0.5508242 0. 0.07058554 0.3677264
0.4366546 0.44362497 0.7192635 0.37616986 0.6728799 0.7032008
0.46907538 0.63514024]
[0.9157533 0.7667402 0.08980197 0.03593295 0.16176797 0.27679572
0.07331014 0.3905285 0.6513384 0.02713427 0.05523694 0.12147208
0.45582628 0.8139887 0.91629887 0.36665893 0.07873632 0.78268915
0.63404864 0.46638715]]
<NDArray 2x20 @cpu(0)>
Hybridisation
它可以定义为 Apache MXNet 用于创建正向计算的符号图的过程。混合允许 MXNet 通过优化计算符号图来提高计算性能。实际上,我们可能会发现,在实现现有层时,块从 HybridBlock 继承,而不是直接继承自 Block 。
It may be defined as a process used by Apache MXNet’s to create a symbolic graph of a forward computation. Hybridisation allows MXNet to upsurge the computation performance by optimising the computational symbolic graph. Rather than directly inheriting from Block, in fact, we may find that while implementing existing layers a block inherits from a HybridBlock.
原因如下 −
Following are the reasons for this −
-
Allows us to write custom layers: HybridBlock allows us to write custom layers that can further be used in imperative and symbolic programming both.
-
Increase computation performance− HybridBlock optimise the computational symbolic graph which allows MXNet to increase computation performance.
Example
在此示例中,我们将通过使用 HybridBlock 重写我们上面创建的示例层:
In this example, we will be rewriting our example layer, created above, by using HybridBlock:
class NormalizationHybridLayer(gluon.HybridBlock):
def __init__(self):
super(NormalizationHybridLayer, self).__init__()
def hybrid_forward(self, F, x):
return F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))
layer_hybd = NormalizationHybridLayer()
layer_hybd(nd.array([1, 2, 3, 4, 5, 6], ctx=mx.cpu()))
Output
输出如下所示:
The output is stated below:
[0. 0.2 0.4 0.6 0.8 1. ]
<NDArray 6 @cpu(0)>
混合与 GPU 上的计算无关,人们可以在 CPU 和 GPU 上训练混合网络和非混合网络。
Hybridisation has nothing to do with computation on GPU and one can train hybridised as well as non-hybridised networks on both CPU and GPU.
Difference between Block and HybridBlock
如果我们将 Block 类与 HybridBlock 类进行比较,我们会看到 HybridBlock 已经实现了它的 forward() 方法。 HybridBlock 定义了在创建层时需要实现的 hybrid_forward() 方法。F 参数创建了 forward() 和 hybrid_forward() 之间的主要区别。在 MXNet 社区中,F 参数被称为后端。F 可以引用 mxnet.ndarray API (用于命令式编程)或 mxnet.symbol API (用于符号编程)。
If we will compare the Block Class and HybridBlock, we will see that HybridBlock already has its forward() method implemented. HybridBlock defines a hybrid_forward() method that needs to be implemented while creating the layers. F argument creates the main difference between forward() and hybrid_forward(). In MXNet community, F argument is referred to as a backend. F can either refer to mxnet.ndarray API (used for imperative programming) or mxnet.symbol API (used for Symbolic programming).
How to add custom layer to a network?
除了单独使用自定义层之外,这些层还与预定义的层一起使用。我们可以使用 Sequential 或 HybridSequential 容器来从顺序神经网络。如上所述, Sequential 容器分别继承自 Block 和 HybridSequential 继承自 HybridBlock 。
Instead of using custom layers separately, these layers are used with predefined layers. We can use either Sequential or HybridSequential containers to from a sequential neural network. As discussed earlier also, Sequential container inherit from Block and HybridSequential inherit from HybridBlock respectively.
Example
在下面的示例中,我们将创建一个具有自定义层并且简单的神经网络。 Dense (5) 层的输出将成为 NormalizationHybridLayer 的输入。 NormalizationHybridLayer 的输出将成为 Dense (1) 层的输入。
In the example below, we will be creating a simple neural network with a custom layer. The output from Dense (5) layer will be the input of NormalizationHybridLayer. The output of NormalizationHybridLayer will become the input of Dense (1) layer.
net = gluon.nn.HybridSequential()
with net.name_scope():
net.add(Dense(5))
net.add(NormalizationHybridLayer())
net.add(Dense(1))
net.initialize(mx.init.Xavier(magnitude=2.24))
net.hybridize()
input = nd.random_uniform(low=-10, high=10, shape=(10, 2))
net(input)
Output
您将看到以下输出 −
You will see the following output −
[[-1.1272651]
[-1.2299833]
[-1.0662932]
[-1.1805027]
[-1.3382034]
[-1.2081106]
[-1.1263978]
[-1.2524893]
[-1.1044774]
[-1.316593 ]]
<NDArray 10x1 @cpu(0)>
Custom layer parameters
在神经网络中,一层具有一组与其关联的参数。有时我们称之为权重,它是一层的内部状态。这些参数起着不同的作用 −
In a neural network, a layer has a set of parameters associated with it. We sometimes refer them as weights, which is internal state of a layer. These parameters play different roles −
-
Sometimes these are the ones that we want to learn during backpropagation step.
-
Sometimes these are just constants we want to use during forward pass.
如果我们讨论编程概念,则这些参数(权重)通过 ParameterDict 类存储并访问,该类有助于其初始化、更新、保存和加载。
If we talk about the programming concept, these parameters (weights) of a block are stored and accessed via ParameterDict class which helps in initialisation, updation, saving, and loading of them.
Example
在下面的示例中,我们将定义以下两组参数 −
In the example below, we will be defining two following sets of parameters −
-
Parameter weights − This is trainable, and its shape is unknown during construction phase. It will be inferred on the first run of forward propagation.
-
Parameter scale − This is a constant whose value doesn’t change. As opposite to parameter weights, its shape is defined during construction.
class NormalizationHybridLayer(gluon.HybridBlock):
def __init__(self, hidden_units, scales):
super(NormalizationHybridLayer, self).__init__()
with self.name_scope():
self.weights = self.params.get('weights',
shape=(hidden_units, 0),
allow_deferred_init=True)
self.scales = self.params.get('scales',
shape=scales.shape,
init=mx.init.Constant(scales.asnumpy()),
differentiable=False)
def hybrid_forward(self, F, x, weights, scales):
normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)),
(F.broadcast_sub(F.max(x), F.min(x))))
weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
scaled_data = F.broadcast_mul(scales, weighted_data)
return scaled_data