Apache Mxnet 简明教程

Apache MXNet - Quick Guide

Apache MXNet - Introduction

本章重点介绍了 Apache MXNet 的功能，并讨论了该深度学习软件框架的最新版本。

This chapter highlights the features of Apache MXNet and talks about the latest version of this deep learning software framework.

What is MXNet?

Apache MXNet 是一款功能强大的开源深度学习软件框架工具，旨在帮助开发人员构建、训练和部署深度学习模型。在过去的几年里，从医疗保健到交通再到制造业，事实上，在我们的日常生活各个方面，深度学习的影响都已得到广泛普及。如今，深度学习被公司用于解决一些难题，如人脸识别、物体检测、光学字符识别 (OCR)、语音识别和机器翻译。

Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation.

这就是 Apache MXNet 受支持的原因：

That’s the reason Apache MXNet is supported by:

Some big companies like Intel, Baidu, Microsoft, Wolfram Research, etc.
Public cloud providers including Amazon Web Services (AWS), and Microsoft Azure
Some big research institutes like Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science & Technology.

Why Apache MXNet?

当时，已经存在各种深度学习平台，如 Torch7、Caffe、Theano、TensorFlow、Keras、Microsoft Cognitive Toolkit 等，您可能想知道为什么选择 Apache MXNet？我们来了解一下背后的原因：

There are various deep learning platforms like Torch7, Caffe, Theano, TensorFlow, Keras, Microsoft Cognitive Toolkit, etc. existed then you might wonder why Apache MXNet? Let’s check out some of the reasons behind it:

Apache MXNet solves one of the biggest issues of existing deep learning platforms. The issue is that in order to use deep learning platforms one must need to learn another system for a different programming flavor.
With the help of Apache MXNet developers can exploit the full capabilities of GPUs as well as cloud computing.
Apache MXNet can accelerate any numerical computation and places a special emphasis on speeding up the development and deployment of large-scale DNN (deep neural networks).
It provides the users the capabilities of both imperative and symbolic programming.

Various Features

如果您正在寻找一个灵活的深度学习库来快速开发前沿的深度学习研究，或者一个强大的平台来推动生产工作负载，那么 Apache MXNet 就是您搜索之旅的终点。这是因为它具有以下特点：

If you are looking for a flexible deep learning library to quickly develop cutting-edge deep learning research or a robust platform to push production workload, your search ends at Apache MXNet. It is because of the following features of it:

Distributed Training

无论是在具有近似线性缩放效率的多 GPU 或多主机训练中，Apache MXNet 都允许开发人员最大程度地利用其硬件。MXNet 也支持与 Horovod 集成，Horovod 是优步创建的一个开源分布式深度学习框架。

Whether it is multi-gpu or multi-host training with near-linear scaling efficiency, Apache MXNet allows developers to make most out of their hardware. MXNet also support integration with Horovod, which is an open source distributed deep learning framework created at Uber.

针对此次集成，以下是一些 Horovod 中定义的常见分布式 API：

For this integration, following are some of the common distributed APIs defined in Horovod:

horovod.broadcast()
horovod.allgather()
horovod.allgather()

在这方面，MXNet 为我们提供了以下功能：

In this regard, MXNet offer us the following capabilities:

Device Placement − With the help of MXNet we can easily specify each data structure (DS).
Automatic Differentiation − Apache MXNet automates the differentiation i.e. derivative calculations.
Multi-GPU training − MXNet allows us to achieve scaling efficiency with number of available GPUs.
Optimized Predefined Layers − We can code our own layers in MXNet as well as the optimized the predefined layers for speed also.

Hybridization

Apache MXNet 为其用户提供了混合前端。在 Gluon Python API 的帮助下，它可以弥合理命令和符号功能之间的差距。可以通过调用其混合功能来完成。

Apache MXNet provides its users a hybrid front-end. With the help of the Gluon Python API it can bridge the gap between its imperative and symbolic capabilities. It can be done by calling it’s hybridize functionality.

Faster Computation

线性运算（如几十个或几百个矩阵乘法）是深度神经网络的计算瓶颈。为了解决这个瓶颈问题，MXNet 提供了：

The linear operations like tens or hundreds of matrix multiplications are the computational bottleneck for deep neural nets. To solve this bottleneck MXNet provides −

Optimized numerical computation for GPUs
Optimized numerical computation for distributed ecosystems
Automation of common workflows with the help of which the standard NN can be expressed briefly.

Language Bindings

MXNet 与 Python 和 R 等高级语言深度集成。它还为其他编程语言（如以下语言）提供支持：

MXNet has deep integration into high-level languages like Python and R. It also provides support for other programming languages such as-

Scala
Julia
Clojure
Java
C/C++
Perl

我们无需学习任何新的编程语言，而且 MXNet 与混合特性相结合，可以非常流畅地从 Python 过渡到我们选择的编程语言的部署。

We do not need to learn any new programming language instead MXNet, combined with hybridization feature, allows an exceptionally smooth transition from Python to deployment in the programming language of our choice.

Latest version MXNet 1.6.0

Apache Software Foundation (ASF) 已于 2020 年 2 月 21 日根据 Apache License 2.0 发布了 Apache MXNet 的稳定版本 1.6.0。这是支持 Python 2 的最后一个 MXNet 版本，因为 MXNet 社区投票决定在以后的版本中不再支持 Python 2。让我们了解一下此版本为其用户带来的一些新特性。

Apache Software Foundation (ASF) has released the stable version 1.6.0 of Apache MXNet on 21st February 2020 under Apache License 2.0. This is the last MXNet release to support Python 2 as MXNet community voted to no longer support Python 2 in further releases. Let us check out some of the new features this release brings for its users.

NumPy-Compatible interface

NumPy 由于其灵活性与普遍性，已得到机器学习从业者、科学家和学生们的广泛使用。但我们知道，如今像图形处理单元 (GPU) 这样的硬件加速器已逐渐集成到各种机器学习 (ML) 工具包中，为了利用 GPU 的速度，NumPy 用户需要切换到具有不同语法的框架。

Due to its flexibility and generality, NumPy has been widely used by Machine Learning practitioners, scientists, and students. But as we know that, these days’ hardware accelerators like Graphical Processing Units (GPUs) have become increasingly assimilated into various Machine Learning (ML) toolkits, the NumPy users, to take advantage of the speed of GPUs, need to switch to new frameworks with different syntax.

凭借 MXNet 1.6.0，Apache MXNet 正朝着一个 NumPy 兼容的编程体验迈进。新界面为熟悉 NumPy 语法的从业者提供了同等的可使用性和表现力。除此之外，MXNet 1.6.0 还让现有的 Numpy 系统能够利用 GPU 等硬件加速器来加速大规模计算。

With MXNet 1.6.0, Apache MXNet is moving toward a NumPy-compatible programming experience. The new interface provides equivalent usability as well as expressiveness to the practitioners familiar with NumPy syntax. Along with that MXNet 1.6.0 also enables the existing Numpy system to utilize hardware accelerators like GPUs to speed-up large-scale computations.

Integration with Apache TVM

Apache TVM 是一个针对 CPU、GPU 和专用加速器等硬件后端的开源端到端深度学习编译器堆栈，其目标是填补以生产力为重点的深度学习框架和面向性能的硬件后端之间的空白。借助最新版本 MXNet 1.6.0，用户可以利用 Apache(incubating) TVM 在 Python 编程语言中实现高性能操作内核。这项新功能的两个主要优点包括：

Apache TVM, an open-source end-to-end deep learning compiler stack for hardware-backends such as CPUs, GPUs, and specialized accelerators, aims to fill the gap between the productivity-focused deep-learning frameworks and performance-oriented hardware backends. With the latest release MXNet 1.6.0, users can leverage Apache(incubating) TVM to implement high-performance operator kernels in Python programming language. Two main advantages of this new feature are following −

Simplifies the former C++ based development process.
Enables sharing the same implementation across multiple hardware backend such as CPUs, GPUs, etc.

Improvements on existing features

除了上述 MXNet 1.6.0 的特性外，它还对现有特性进行了一些改进。改进内容如下：

Apart from the above listed features of MXNet 1.6.0, it also provides some improvements over the existing features. The improvements are as follows −

Grouping element-wise operation for GPU

正如我们所知，逐元素运算的性能取决于内存带宽，因此，链接此类运算可能会降低整体性能。Apache MXNet 1.6.0 实现了逐元素运算融合，这实际上会按需生成即时融合运算。此类逐元素运算融合还能减少存储需求并提高整体性能。

As we know the performance of element-wise operations is memory-bandwidth and that is the reason, chaining such operations may reduce overall performance. Apache MXNet 1.6.0 does element-wise operation fusion, that actually generates just-in-time fused operations as and when possible. Such element-wise operation fusion also reduces storage needs and improve overall performance.

Simplifying common expressions

MXNet 1.6.0 消除冗余表达式，并简化通用表达式。这种增强还有利于提高内存使用情况和全部执行时间。

MXNet 1.6.0 eliminates the redundant expressions and simplify the common expressions. Such enhancement also improves memory usage and total execution time.

Optimizations

MXNet 1.6.0 还对现有特性和操作符提供多种优化，如下所示：

MXNet 1.6.0 also provides various optimizations to existing features & operators, which are as follows:

Automatic Mixed Precision
Gluon Fit API
MKL-DNN
Large tensor Support
TensorRT integration
Higher-order gradient support
Operators
Operator performance profiler
ONNX import/export
Improvements to Gluon APIs
Improvements to Symbol APIs
More than 100 bug fixes

Apache MXNet - Installing MXNet

为了开始使用 MXNet，我们需要做的第一件事，就是将其安装在我们的计算机上。Apache MXNet 几乎适用于所有可用平台，包括 Windows、Mac 和 Linux。

To get started with MXNet, the first thing we need to do, is to install it on our computer. Apache MXNet works on pretty much all the platforms available, including Windows, Mac, and Linux.

Linux OS

我们可以在 Linux 系统上按照以下方式安装 MXNet -

We can install MXNet on Linux OS in the following ways −

Graphical Processing Unit (GPU)

这里，当我们使用 GPU 进行处理时，我们将使用各种方法，即 Pip、Docker 和源代码来安装 MXNet -

Here, we will use various methods namely Pip, Docker, and Source to install MXNet when we are using GPU for processing −

By using Pip method

您可以使用以下命令在您的 Linus 系统上安装 MXNet -

You can use the following command to install MXNet on your Linus OS −

pip install mxnet

Apache MXNet 还提供 MKL pip 包，在英特尔硬件上运行时速度更快。这里例如 mxnet-cu101mkl 表示 -

Apache MXNet also offers MKL pip packages, which are much faster when running on intel hardware. Here for example mxnet-cu101mkl means that −

The package is built with CUDA/cuDNN
The package is MKL-DNN enabled
The CUDA version is 10.1

对于其他选项，您还可以参考 https://pypi.org/project/mxnet/ 。

For other option you can also refer to https://pypi.org/project/mxnet/.

By using Docker

您可以在 DockerHub 上找到包含 MXNet 的 docker 镜像，它位于 https://hub.docker.com/u/mxnet 。让我们检出以下步骤，以使用带有 GPU 的 Docker 安装 MXNet -

You can find the docker images with MXNet at DockerHub, which is available at https://hub.docker.com/u/mxnet Let us check out the steps below to install MXNet by using Docker with GPU −

Step 1 − 首先，按照可在 https://docs.docker.com/engine/install/ubuntu/ 获得的 docker 安装说明。我们需要在我们的计算机上安装 Docker。

Step 1− First, by following the docker installation instructions which are available at https://docs.docker.com/engine/install/ubuntu/. We need to install Docker on our machine.

Step 2 − 为了从 docker 容器中启用 GPU 的使用，接下来我们需要安装 nvidia-docker-plugin。您可以按照 https://github.com/NVIDIA/nvidia-docker/wiki 中提供的安装说明进行操作。

Step 2− To enable the usage of GPUs from the docker containers, next we need to install nvidia-docker-plugin. You can follow the installation instructions given at https://github.com/NVIDIA/nvidia-docker/wiki.

Step 3 − 使用以下命令，您可以拉取 MXNet docker 镜像 -

Step 3− By using the following command, you can pull the MXNet docker image −

$ sudo docker pull mxnet/python:gpu

现在为了查看 mxnet/python docker 镜像拉取是否成功，我们可以按如下列出 docker 镜像 -

Now in order to see if mxnet/python docker image pull was successful, we can list docker images as follows −

$ sudo docker images

为了获得最快的 MXNet 推断速度，建议使用带有 Intel MKL-DNN 的最新 MXNet。查看以下命令 -

For the fastest inference speeds with MXNet, it is recommended to use the latest MXNet with Intel MKL-DNN. Check the commands below −

$ sudo docker pull mxnet/python:1.3.0_cpu_mkl
$ sudo docker images

From source

要通过 GPU 构建 MXNet 共享库，首先需要为 CUDA 和 cuDNN 设置环境，如下所述−

To build the MXNet shared library from source with GPU, first we need to set up the environment for CUDA and cuDNN as follows−

Download and install CUDA toolkit, here CUDA 9.2 is recommended.
Next download cuDNN 7.1.4.
Now we need to unzip the file. It is also required to change to the cuDNN root directory. Also move the header and libraries to local CUDA Toolkit folder as follows −

tar xvzf cudnn-9.2-linux-x64-v7.1
sudo cp -P cuda/include/cudnn.h /usr/local/cuda/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
sudo ldconfig

设置 CUDA 和 cuDNN 的环境后，按照以下步骤从源代码构建 MXNet 共享库：

After setting up the environment for CUDA and cuDNN, follow the steps below to build the MXNet shared library from source −

Step 1 − 首先，我们需要安装必备软件包。Ubuntu 16.04 或更高版本需要这些依赖项。

Step 1− First, we need to install the prerequisite packages. These dependencies are required on Ubuntu version 16.04 or later.

sudo apt-get update
sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev
libopencv-dev cmake

Step 2 − 在此步骤中，我们将下载 MXNet 源代码并进行配置。首先，让我们使用以下命令克隆存储库：

Step 2− In this step, we will download MXNet source and configure. First let us clone the repository by using following command−

git clone –recursive https://github.com/apache/incubator-mxnet.git mxnet
cd mxnet
cp config/linux_gpu.cmake #for build with CUDA

Step 3 − 使用以下命令可以构建 MXNet 核心共享库：

Step 3− By using the following commands, you can build MXNet core shared library−

rm -rf build
mkdir -p build && cd build
cmake -GNinja ..
cmake --build .

Two important points regarding the above step is as follows−

如果要构建调试版本，请按以下方式指定：

If you want to build the Debug version, then specify the as follows−

cmake -DCMAKE_BUILD_TYPE=Debug -GNinja ..

为了设置并行编译作业的数量，请指定以下内容：

In order to set the number of parallel compilation jobs, specify the following −

cmake --build . --parallel N

一旦成功构建 MXNet 核心共享库，您将在 build 中 MXNet project root, 找到 libmxnet.so ，这是安装语言绑定（可选）所必需的。

Once you successfully build MXNet core shared library, in the build folder in your MXNet project root, you will find libmxnet.so which is required to install language bindings(optional).

Central Processing Unit (CPU)

在此，当使用 CPU 进行处理时，我们将使用各种方法（即 Pip、Docker 和 Source）来安装 MXNet：

Here, we will use various methods namely Pip, Docker, and Source to install MXNet when we are using CPU for processing −

By using Pip method

可以使用以下命令在 Linus OS 上安装 MXNet：

You can use the following command to install MXNet on your Linus OS−

pip install mxnet

当在英特尔硬件上运行时，Apache MXNet 还提供了支持 MKL-DNN 的 pip 包，而这些包要快得多。

Apache MXNet also offers MKL-DNN enabled pip packages which are much faster, when running on intel hardware.

pip install mxnet-mkl

By using Docker

在 DockerHub 上可以找到带 MXNet 的 Docker 镜像，网址为 https://hub.docker.com/u/mxnet 。让我们查看以下步骤，以使用 Docker 和 CPU 安装 MXNet：

You can find the docker images with MXNet at DockerHub, which is available at https://hub.docker.com/u/mxnet. Let us check out the steps below to install MXNet by using Docker with CPU −

Step 1 − 首先，按照可在 https://docs.docker.com/engine/install/ubuntu/ 获得的 docker 安装说明。我们需要在我们的计算机上安装 Docker。

Step 1− First, by following the docker installation instructions which are available at https://docs.docker.com/engine/install/ubuntu/. We need to install Docker on our machine.

Step 2 − 使用以下命令可以提取 MXNet docker 镜像：

Step 2− By using the following command, you can pull the MXNet docker image:

$ sudo docker pull mxnet/python

现在，为了查看 mxnet/python docker 镜像提取是否成功，我们可以按如下方式列出 docker 镜像：

Now, in order to see if mxnet/python docker image pull was successful, we can list docker images as follows −

$ sudo docker images

为了获得 MXNet 的最快推理速度，建议使用带有英特尔 MKL-DNN 的最新 MXNet。

For the fastest inference speeds with MXNet, it is recommended to use the latest MXNet with Intel MKL-DNN.

检查以下命令：

Check the commands below −

$ sudo docker pull mxnet/python:1.3.0_cpu_mkl
$ sudo docker images

From source

若要从源代码使用 CPU 编译 MXNet 共享库，请执行以下步骤 −

To build the MXNet shared library from source with CPU, follow the steps below −

Step 1 − 首先，我们需要安装必备软件包。Ubuntu 16.04 或更高版本需要这些依赖项。

Step 1− First, we need to install the prerequisite packages. These dependencies are required on Ubuntu version 16.04 or later.

sudo apt-get update

sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev libopencv-dev cmake

Step 2 − 在此步骤中，我们将下载 MXNet 源代码并进行配置。首先，让我们使用以下命令克隆存储库：

Step 2− In this step we will download MXNet source and configure. First let us clone the repository by using following command:

git clone –recursive https://github.com/apache/incubator-mxnet.git mxnet

cd mxnet
cp config/linux.cmake config.cmake

Step 3 − 您可以使用以下命令编译 MXNet 核心共享库：

Step 3− By using the following commands, you can build MXNet core shared library:

rm -rf build
mkdir -p build && cd build
cmake -GNinja ..
cmake --build .

Two important points regarding the above step is as follows−

如果您想编译调试版本，请按如下指定：

If you want to build the Debug version, then specify the as follows:

cmake -DCMAKE_BUILD_TYPE=Debug -GNinja ..

要设置并行编译作业的数量，请指定以下内容 −

In order to set the number of parallel compilation jobs, specify the following−

cmake --build . --parallel N

一旦成功编译 MXNet 核心共享库，您将在 MXNet 项目根目录的 build 文件夹中找到 libmxnet.so，该库是安装语言绑定（可选）所需的。

Once you successfully build MXNet core shared library, in the build folder in your MXNet project root, you will find libmxnet.so, which is required to install language bindings(optional).

MacOS

我们可以在 MacOS 上通过以下方式安装 MXNet −

We can install MXNet on MacOS in the following ways−

Graphical Processing Unit (GPU)

如果您打算在带有 GPU 的 MacOS 上编译 MXNet，那么 NO Pip 和 Docker 方法可用。在这种情况下，唯一的方法是从源代码编译它。

If you plan to build MXNet on MacOS with GPU, then there is NO Pip and Docker method available. The only method in this case is to build it from source.

From source

要从源代码使用 GPU 编译 MXNet 共享库，首先需要为 CUDA 和 cuDNN 设置环境。您需要按照 NVIDIA CUDA Installation Guide 中提供的说明操作，该说明可在 https://docs.nvidia.com 中找到以及 cuDNN Installation Guide, 中提供的说明，该说明可在 https://docs.nvidia.com/deeplearning 中找到以适用于 Mac OS。

To build the MXNet shared library from source with GPU, first we need to set up the environment for CUDA and cuDNN. You need to follow the NVIDIA CUDA Installation Guide which is available at https://docs.nvidia.com and cuDNN Installation Guide, which is available at https://docs.nvidia.com/deeplearning for mac OS.

请注意，在 2019 年中，CUDA 停止了对 macOS 的支持。事实上，未来版本的 CUDA 可能也不支持 macOS。

Please note that in 2019 CUDA stopped supporting macOS. In fact, future versions of CUDA may also not support macOS.

一旦您为 CUDA 和 cuDNN 设置了环境，按照下面给出的步骤在 OS X（Mac）上从源代码安装 MXNet−

Once you set up the environment for CUDA and cuDNN, follow the steps given below to install MXNet from source on OS X (Mac)−

Step 1 − 由于我们在 OS x 上有一些依赖项，因此首先需要安装必备包。

Step 1− As we need some dependencies on OS x, First, we need to install the prerequisite packages.

xcode-select –-install #Install OS X Developer Tools

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" #Install Homebrew

brew install cmake ninja ccache opencv # Install dependencies

我们还可以不用 OpenCV 编译 MXNet，因为 opencv 是可选的依赖项。

We can also build MXNet without OpenCV as opencv is an optional dependency.

Step 2 − 在此步骤中，我们下载 MXNet 源代码并进行配置。首先，让我们使用以下命令克隆存储库−

Step 2− In this step we will download MXNet source and configure. First let us clone the repository by using following command−

git clone –-recursive https://github.com/apache/incubator-mxnet.git mxnet

cd mxnet
cp config/linux.cmake config.cmake

对于支持 GPU，首先有必要安装 CUDA 依赖项，因为当人们尝试在没有 GPU 的机器上编译支持 GPU 的编译时，MXNet 编译不能自动检测到您的 GPU 架构。在这样的情况下，MXNet 将针对所有可用的 GPU 架构。

For a GPU-enabled, it is necessary to install the CUDA dependencies first because when one tries to build a GPU-enabled build on a machine without GPU, MXNet build cannot autodetect your GPU architecture. In such cases MXNet will target all available GPU architectures.

Step 3 − 使用以下命令可以构建 MXNet 核心共享库：

Step 3− By using the following commands, you can build MXNet core shared library−

rm -rf build
mkdir -p build && cd build
cmake -GNinja ..
cmake --build .

有关上述步骤的两个重要说明如下−

Two important points regarding the above step is as follows−

如果要构建调试版本，请按以下方式指定：

If you want to build the Debug version, then specify the as follows−

cmake -DCMAKE_BUILD_TYPE=Debug -GNinja ..

要设置并行编译作业的数量，请指定以下内容：

In order to set the number of parallel compilation jobs, specify the following:

cmake --build . --parallel N

一旦成功编译 MXNet 核心共享库，您将在 build 文件夹中的 MXNet project root, 中找到 libmxnet.dylib, ，它是安装语言绑定（可选）所需的。

Once you successfully build MXNet core shared library, in the build folder in your MXNet project root, you will find libmxnet.dylib, which is required to install language bindings(optional).

Central Processing Unit (CPU)

在这里，当我们使用 CPU 进行处理时，我们将使用 Pip、Docker 和源这几种方法来安装 MXNet−

Here, we will use various methods namely Pip, Docker, and Source to install MXNet when we are using CPU for processing−

By using Pip method

您可以使用以下命令在您的 Linus 操作系统上安装 MXNet

You can use the following command to install MXNet on your Linus OS

pip install mxnet

By using Docker

您可以在 DockerHub 上找到带有 MXNet 的 docker 映像，它可在 https://hub.docker.com/u/mxnet 上获得。让我们看看以下步骤以使用具有 CPU 的 Docker 安装 MXNet−

You can find the docker images with MXNet at DockerHub, which is available at https://hub.docker.com/u/mxnet. Let us check out the steps below to install MXNet by using Docker with CPU−

Step 1 − 首先，按照可在 https://docs.docker.com/docker-for-mac 上获得的 docker installation instructions 安装 Docker 到我们的机器上。

Step 1− First, by following the docker installation instructions which are available at https://docs.docker.com/docker-for-mac we need to install Docker on our machine.

Step 2 − 通过使用以下命令，您可以拉取 MXNet docker 映像−

Step 2− By using the following command, you can pull the MXNet docker image−

$ docker pull mxnet/python

现在为了查看 mxnet/python docker 映像拉取是否成功，我们可以按以下方式列出 docker 映像−

Now in order to see if mxnet/python docker image pull was successful, we can list docker images as follows−

$ docker images

为了获得 MXNet 最快的推理速度，推荐使用带有 Intel MKL-DNN 的最新 MXNet。查看以下命令−

For the fastest inference speeds with MXNet, it is recommended to use the latest MXNet with Intel MKL-DNN. Check the commands below−

$ docker pull mxnet/python:1.3.0_cpu_mkl
$ docker images

From source

按照以下提供的步骤在 OS X（Mac）上从源代码安装 MXNet−

Follow the steps given below to install MXNet from source on OS X (Mac)−

Step 1 − 因为我们需要一些在 OS x 上的依赖项，所以首先，我们需要安装先决条件包。

Step 1− As we need some dependencies on OS x, first, we need to install the prerequisite packages.

xcode-select –-install #Install OS X Developer Tools
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" #Install Homebrew
brew install cmake ninja ccache opencv # Install dependencies

我们还可以不用 OpenCV 编译 MXNet，因为 opencv 是可选的依赖项。

We can also build MXNet without OpenCV as opencv is an optional dependency.

Step 2 − 在此步骤中，我们将下载 MXNet 源代码并配置。首先，让我们通过使用以下命令克隆仓库−

Step 2− In this step we will download MXNet source and configure. First, let us clone the repository by using following command−

git clone –-recursive https://github.com/apache/incubator-mxnet.git mxnet

cd mxnet

cp config/linux.cmake config.cmake

Step 3 − 您可以使用以下命令编译 MXNet 核心共享库：

Step 3− By using the following commands, you can build MXNet core shared library:

rm -rf build
mkdir -p build && cd build
cmake -GNinja ..
cmake --build .

Two important points regarding the above step is as follows−

如果要构建调试版本，请按以下方式指定：

If you want to build the Debug version, then specify the as follows−

cmake -DCMAKE_BUILD_TYPE=Debug -GNinja ..

要设置并行编译作业的数量，请指定以下内容 −

In order to set the number of parallel compilation jobs, specify the following−

cmake --build . --parallel N

一旦成功编译 MXNet 核心共享库，您将在 build 文件夹中的 MXNet project root, 中找到 libmxnet.dylib, ，它是安装语言绑定（可选）所需的。

Windows OS

为在 Windows 上安装 MXNet，以下为先决条件−

To install MXNet on Windows, following are the prerequisites−

Minimum System Requirements

Windows 7, 10, Server 2012 R2, or Server 2016
Visual Studio 2015 or 2017 (any type)
Python 2.7 or 3.6
pip

Recommended System Requirements

Windows 10, Server 2012 R2, or Server 2016
Visual Studio 2017
At least one NVIDIA CUDA-enabled GPU
MKL-enabled CPU: Intel® Xeon® processor, Intel® Core™ processor family, Intel Atom® processor, or Intel® Xeon Phi™ processor
Python 2.7 or 3.6
pip

Graphical Processing Unit (GPU)

By using Pip method−

如果您计划在带有 NVIDIA GPU 的 Windows 上构建 MXNet，则有两个选择，可以使用 Python 包使用 CUDA 支持安装 MXNet−

If you plan to build MXNet on Windows with NVIDIA GPUs, there are two options for installing MXNet with CUDA support with a Python package−

Install with CUDA Support

以下是我们借助其设置带有 CUDA 的 MXNet 的步骤。

Below are the steps with the help of which, we can setup MXNet with CUDA.

Step 1 − 首先安装 Microsoft Visual Studio 2017 或 Microsoft Visual Studio 2015。

Step 1− First install Microsoft Visual Studio 2017 or Microsoft Visual Studio 2015.

Step 2 − 接下来，下载并安装 NVIDIA CUDA。推荐使用 CUDA 版本 9.2 或 9.0，因为在过去已经发现了 CUDA 9.1 中的一些问题。

Step 2− Next, download and install NVIDIA CUDA. It is recommended to use CUDA versions 9.2 or 9.0 because some issues with CUDA 9.1 have been identified in the past.

Step 3 − 现在，下载并安装 NVIDIA_CUDA_DNN。

Step 3− Now, download and install NVIDIA_CUDA_DNN.

Step 4 − 最后，使用以下 pip 命令，安装带 CUDA 的 MXNet。

Step 4− Finally, by using following pip command, install MXNet with CUDA−

pip install mxnet-cu92

Install with CUDA and MKL Support

以下是可供参考的步骤，我们可以利用这些步骤，使用 CUDA 和 MKL 设置 MXNet。

Below are the steps with the help of which, we can setup MXNet with CUDA and MKL.

Step 1 − 首先安装 Microsoft Visual Studio 2017 或 Microsoft Visual Studio 2015。

Step 1− First install Microsoft Visual Studio 2017 or Microsoft Visual Studio 2015.

Step 2 − 接下来，下载并安装英特尔 MKL。

Step 2− Next, download and install intel MKL

Step 3 − 现在，下载并安装 NVIDIA CUDA。

Step 3− Now, download and install NVIDIA CUDA.

Step 4 − 现在，下载并安装 NVIDIA_CUDA_DNN。

Step 4− Now, download and install NVIDIA_CUDA_DNN.

Step 5 − 最后，使用以下 pip 命令，安装带 MKL 的 MXNet。

Step 5− Finally, by using following pip command, install MXNet with MKL.

pip install mxnet-cu92mkl

From source

要通过源代码使用 GPU 构建 MXNet 核心库，我们有以下两个选项：

To build the MXNet core library from source with GPU, we have the following two options−

Option 1− Build with Microsoft Visual Studio 2017

为了使用 Microsoft Visual Studio 2017 自己构建和安装 MXNet，您需要以下依赖项。

In order to build and install MXNet yourself by using Microsoft Visual Studio 2017, you need the following dependencies.

Install/update Microsoft Visual Studio.

If Microsoft Visual Studio is not already installed on your machine, first download and install it.
It will prompt about installing Git. Install it also.
If Microsoft Visual Studio is already installed on your machine but you want to update it then proceed to the next step to modify your installation. Here you will be given the opportunity to update Microsoft Visual Studio as well.

按照 https://docs.microsoft.com/en-us 中提供的打开 Visual Studio 安装程序的说明来修改各个组件。

Follow the instructions for opening the Visual Studio Installer available at https://docs.microsoft.com/en-us to modify Individual components.

在 Visual Studio 安装程序应用程序中，根据需要更新。之后，查找并选中 VC++ 2017 version 15.4 v14.11 toolset ，然后单击 Modify 。

In the Visual Studio Installer application, update as required. After that look for and check VC++ 2017 version 15.4 v14.11 toolset and click Modify.

现在，使用以下命令，将 Microsoft VS2017 的版本更改为 v14.11−

Now by using the following command, change the version of the Microsoft VS2017 to v14.11−

"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.11

接下来，您需要下载并安装 CMake ，它可在 https://cmake.org/download/ 获得。建议您使用 CMake v3.12.2 ，它可在 https://cmake.org/download/ 获得，因为它经过 MXNet 测试。

Next, you need to download and install CMake available at https://cmake.org/download/ It is recommended to use CMake v3.12.2 which is available at https://cmake.org/download/ because it is tested with MXNet.

现在，下载并运行 OpenCV 程序包，它可在 https://sourceforge.net/projects/opencvlibrary/ 获得。该程序包将解压缩几个文件。由您决定是否将它们放入另一个目录。在这里，我们将 C:\utils(mkdir C:\utils) 路径用作我们的默认路径。

Now, download and run the OpenCV package available at https://sourceforge.net/projects/opencvlibrary/which will unzip several files. It is up to you if you want to place them in another directory or not. Here, we will use the path C:\utils(mkdir C:\utils) as our default path.

接下来，我们需要设置环境变量 OpenCV_DIR，以便指向我们刚刚解压的 OpenCV 构建目录。为此，打开命令提示符并键入 set OpenCV_DIR=C:\utils\opencv\build 。

Next, we need to set the environment variable OpenCV_DIR to point to the OpenCV build directory that we have just unzipped. For this open command prompt and type set OpenCV_DIR=C:\utils\opencv\build.

一个重要的问题是，如果您没有安装英特尔 MKL（Math Kernel Library），您可以安装它。

One important point is that if you do not have the Intel MKL (Math Kernel Library) installed the you can install it.

您可以使用的另一个开源包是 OpenBLAS 。在此，为了进一步说明，我们假设您正在使用 OpenBLAS 。

Another open source package you can use is OpenBLAS. Here for the further instructions we are assuming that you are using OpenBLAS.

因此，下载 OpenBlas 包，该包可在 https://sourceforge.net 中获取，然后解压该文件，将其重命名为 OpenBLAS 并将其放在 C:\utils 下。

So, Download the OpenBlas package which is available at https://sourceforge.net and unzip the file, rename it to OpenBLAS and put it under C:\utils.

接下来，我们需要设置环境变量 OpenBLAS_HOME 以指向包含 include 和 lib 目录的 OpenBLAS 目录。为此，打开命令提示符并键入 set OpenBLAS_HOME=C:\utils\OpenBLAS 。

Next, we need to set the environment variable OpenBLAS_HOME to point to the OpenBLAS directory that contains the include and lib directories. For this open command prompt and type set OpenBLAS_HOME=C:\utils\OpenBLAS.

现在，下载并安装可在 https://developer.nvidia.com 中获取的 CUDA。请注意，如果您已经安装了 CUDA，然后安装了 Microsoft VS2017，那么您现在需要重新安装 CUDA，以便您可以获取 Microsoft VS2017 集成的 CUDA 工具包组件。

Now, download and install CUDA available at https://developer.nvidia.com. Note that, if you already had CUDA, then installed Microsoft VS2017, you need to reinstall CUDA now, so that you can get the CUDA toolkit components for Microsoft VS2017 integration.

接下来，您需要下载并安装 cuDNN。

Next, you need to download and install cuDNN.

接下来，您需要下载并安装也在 https://gitforwindows.org/ 的 git。

Next, you need to download and install git which is at https://gitforwindows.org/ also.

一旦安装了所有必需的依赖项，请按照以下步骤来构建 MXNet 源代码 -

Once you have installed all the required dependencies, follow the steps given below to build the MXNet source code−

Step 1 - 在 Windows 中打开命令提示符。

Step 1− Open command prompt in windows.

Step 2 - 现在，使用以下命令，从 GitHub 下载 MXNet 源代码：

Step 2− Now, by using the following command, download the MXNet source code from GitHub:

cd C:\

git clone https://github.com/apache/incubator-mxnet.git --recursive

Step 3 - 接下来，验证以下内容 -

Step 3− Next, verify the following−

DCUDNN_INCLUDE and DCUDNN_LIBRARY 环境变量指向安装了 CUDA 的位置的 include 文件夹和 cudnn.lib 文件

DCUDNN_INCLUDE and DCUDNN_LIBRARY environment variables are pointing to the include folder and cudnn.lib file of your CUDA installed location

C:\incubator-mxnet 是您在上一步中克隆的源代码的位置。

C:\incubator-mxnet is the location of the source code you just cloned in the previous step.

Step 4 - 接下来，使用以下命令，创建一个构建 directory 并转到该目录，例如 -

Step 4− Next by using the following command, create a build directory and also go to the directory, for example−

mkdir C:\incubator-mxnet\build
cd C:\incubator-mxnet\build

Step 5 - 现在，使用 cmake，编译 MXNet 源代码，如下所示 -

Step 5− Now, by using cmake, compile the MXNet source code as follows−

cmake -G "Visual Studio 15 2017 Win64" -T cuda=9.2,host=x64 -DUSE_CUDA=1 -DUSE_CUDNN=1 -DUSE_NVRTC=1 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_LIST=Common -DCUDA_TOOLSET=9.2 -DCUDNN_INCLUDE=C:\cuda\include -DCUDNN_LIBRARY=C:\cuda\lib\x64\cudnn.lib "C:\incubator-mxnet"

Step 6 - CMake 成功完成后，使用以下命令编译 MXNet 源代码 -

Step 6− Once the CMake successfully completed, use the following command to compile the MXNet source code−

msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount

Option 2: Build with Microsoft Visual Studio 2015

为了使用 Microsoft Visual Studio 2015 自己构建并安装 MXNet，您需要以下依赖项。

In order to build and install MXNet yourself by using Microsoft Visual Studio 2015, you need the following dependencies.

安装/更新 Microsoft Visual Studio 2015。从源代码构建 MXnet 的最低要求是，Microsoft Visual Studio 2015 的更新 3。你可以使用 Tools → Extensions and Updates… | Product Updates 菜单对其进行升级。

Install/update Microsoft Visual Studio 2015. The minimum requirement to build MXnet from source is of Update 3 of Microsoft Visual Studio 2015. You can use Tools → Extensions and Updates… | Product Updates menu to upgrade it.

接下来，你需要下载并安装 CMake ，可从 https://cmake.org/download/ 获得。建议使用 CMake v3.12.2 ，该软件位于 https://cmake.org/download/ ，因为它已通过 MXNet 的测试。

Next, you need to download and install CMake which is available at https://cmake.org/download/. It is recommended to use CMake v3.12.2 which is at https://cmake.org/download/, because it is tested with MXNet.

现在，下载并运行 OpenCV 包，该包可从 https://excellmedia.dl.sourceforge.net 获得，它将解压多个文件。至于是否将它们放入另一个目录由你决定。

Now, download and run the OpenCV package available at https://excellmedia.dl.sourceforge.net which will unzip several files. It is up to you, if you want to place them in another directory or not.

接下来，我们需要将环境变量 OpenCV_DIR 设置为指向我们刚刚解压的 OpenCV 构建目录。为此，打开命令提示符并键入 set OpenCV_DIR=C:\opencv\build\x64\vc14\bin 。

Next, we need to set the environment variable OpenCV_DIR to point to the OpenCV build directory that we have just unzipped. For this, open command prompt and type set OpenCV_DIR=C:\opencv\build\x64\vc14\bin.

一个重要的问题是，如果您没有安装英特尔 MKL（Math Kernel Library），您可以安装它。

One important point is that if you do not have the Intel MKL (Math Kernel Library) installed the you can install it.

您可以使用的另一个开源包是 OpenBLAS 。在此，为了进一步说明，我们假设您正在使用 OpenBLAS 。

Another open source package you can use is OpenBLAS. Here for the further instructions we are assuming that you are using OpenBLAS.

因此，下载可从 https://excellmedia.dl.sourceforge.net 获得的 OpenBLAS 包，并解压该文件，将其重命名为 OpenBLAS 并将其放入 C:\utils。

So, Download the OpenBLAS package available at https://excellmedia.dl.sourceforge.net and unzip the file, rename it to OpenBLAS and put it under C:\utils.

接下来，我们需要将环境变量 OpenBLAS_HOME 设置为指向包含 include 和 lib 目录的 OpenBLAS 目录。你可以在 C:\Program files (x86)\OpenBLAS\ 中找到该目录。

Next, we need to set the environment variable OpenBLAS_HOME to point to the OpenBLAS directory that contains the include and lib directories. You can find the directory in C:\Program files (x86)\OpenBLAS\

请注意，如果你已经安装了 CUDA，然后安装了 Microsoft VS2015，那么你需要重新安装 CUDA，这样你才能获得适用于 Microsoft VS2017 集成的 CUDA 工具包组件。

Note that, if you already had CUDA, then installed Microsoft VS2015, you need to reinstall CUDA now so that, you can get the CUDA toolkit components for Microsoft VS2017 integration.

接下来，您需要下载并安装 cuDNN。

Next, you need to download and install cuDNN.

现在，我们需要将环境变量 CUDACXX 设置为指向 CUDA Compiler(C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe （例如）。

Now, we need to Set the environment variable CUDACXX to point to the CUDA Compiler(C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe for example).

类似地，我们还需要将环境变量 CUDNN_ROOT 设置为指向包含 include, lib 和 bin 目录的 cuDNN 目录（例如 C:\Downloads\cudnn-9.1-windows7-x64-v7\cuda ）。

Similarly, we also need to set the environment variable CUDNN_ROOT to point to the cuDNN directory that contains the include, lib and bin directories (C:\Downloads\cudnn-9.1-windows7-x64-v7\cuda for example).

一旦安装了所有必需的依赖项，请按照以下步骤来构建 MXNet 源代码 -

Once you have installed all the required dependencies, follow the steps given below to build the MXNet source code−

Step 1 − 首先，从 GitHub 下载 MXNet 源代码−

Step 1− First, download the MXNet source code from GitHub−

cd C:\
git clone https://github.com/apache/incubator-mxnet.git --recursive

Step 2 − 接下来，使用 CMake 在 ./build 中创建一个 Visual Studio。

Step 2− Next, use CMake to create a Visual Studio in ./build.

Step 3 − 现在，在 Visual Studio 中，我们需要打开解决方案文件 .sln 并对其进行编译。这些命令将在 ./build/Release/ or ./build/Debug 文件夹中生成一个名为 mxnet.dll 的库

Step 3− Now, in Visual Studio, we need to open the solution file,.sln, and compile it. These commands will produce a library called mxnet.dll in the ./build/Release/ or ./build/Debug folder

Step 4 − 一旦 CMake 成功完成，使用以下命令编译 MXNet 源代码

Step 4− Once the CMake successfully completed, use the following command to compile the MXNet source code

msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount

Central Processing Unit (CPU)

在这里，当我们使用 CPU 进行处理时，我们将使用 Pip、Docker 和源这几种方法来安装 MXNet−

Here, we will use various methods namely Pip, Docker, and Source to install MXNet when we are using CPU for processing−

By using Pip method

如果你计划在配备 CPU 的 Windows 上构建 MXNet，则可以使用 Python 包安装 MXNet，方法有两种：

If you plan to build MXNet on Windows with CPUs, there are two options for installing MXNet using a Python package−

Install with CPUs

使用以下命令使用 Python 安装带有 CPU 的 MXNet−

Use the following command to install MXNet with CPU with Python−

pip install mxnet

Install with Intel CPUs

如上所述，MXNet 实验性地支持 Intel MKL，还支持 MKL-DNN。使用以下命令使用 Python 安装带有 Intel CPU 的 MXNet−

As discussed above, MXNet has experimental support for Intel MKL as well as MKL-DNN. Use the following command to install MXNet with Intel CPU with Python−

pip install mxnet-mkl

By using Docker

你可以在 DockerHub 中找到具有 MXNet 的 docker 镜像，可以在 https://hub.docker.com/u/mxnet 上获得。我们通过 Docker 使用 CPU 安装 MXNet 的步骤如下：

You can find the docker images with MXNet at DockerHub, available at https://hub.docker.com/u/mxnet Let us check out the steps below, to install MXNet by using Docker with CPU−

Step 1 − 首先，通过按照可从 https://docs.docker.com/docker-for-mac/install 阅读的 Docker 安装说明进行操作。我们需要在我们的机器上安装 Docker。

Step 1− First, by following the docker installation instructions which can be read at https://docs.docker.com/docker-for-mac/install. We need to install Docker on our machine.

Step 2 − 通过使用以下命令，您可以拉取 MXNet docker 映像−

Step 2− By using the following command, you can pull the MXNet docker image−

$ docker pull mxnet/python

现在为了查看 mxnet/python docker 映像拉取是否成功，我们可以按以下方式列出 docker 映像−

Now in order to see if mxnet/python docker image pull was successful, we can list docker images as follows−

$ docker images

为了获得 MXNet 的最快推理速度，建议使用带有英特尔 MKL-DNN 的最新 MXNet。

For the fastest inference speeds with MXNet, it is recommended to use the latest MXNet with Intel MKL-DNN.

检查下面的命令−

Check the commands below−

$ docker pull mxnet/python:1.3.0_cpu_mkl
$ docker images

Installing MXNet On Cloud and Devices

此部分重点介绍了如何在云端以及设备上安装 Apache MXNet。让我们从了解如何在云端安装 MXNet 开始。

This section highlights how to install Apache MXNet on Cloud and on devices. Let us begin by learning about installing MXNet on cloud.

Installing MXNet On Cloud

您还可以通过对 Graphical Processing Unit (GPU) 提供支持的几个云提供商来获取 Apache MXNet。可以找到的另外两种支持如下−

You can also get Apache MXNet on several cloud providers with Graphical Processing Unit (GPU) support. Two other kind of support you can find are as follows−

GPU/CPU-hybrid support for use cases like scalable inference.
Factorial GPU support with AWS Elastic Inference.

以下是提供支持 Apache MXNet 的不同虚拟机的 GPU 支持的云提供商−

Following are cloud providers providing GPU support with different virtual machine for Apache MXNet−

The Alibaba Console

您可以使用阿里巴巴控制台创建 NVIDIA GPU Cloud Virtual Machine (VM) (可在 https://docs.nvidia.com/ngc 获得)，并使用 Apache MXNet。

You can create the NVIDIA GPU Cloud Virtual Machine (VM) available at https://docs.nvidia.com/ngc with the Alibaba Console and use Apache MXNet.

Amazon Web Services

它还提供 GPU 支持，并为 Apache MXNet 提供以下服务−

It also provides GPU support and gives the following services for Apache MXNet−

Amazon SageMaker

它管理 Apache MXNet 模型的培训和部署。

It manages training and deployment of Apache MXNet models.

AWS Deep Learning AMI

它为 Python 2 和 Python 3 提供了预安装的 Conda 环境，其中包括 Apache MXNet、CUDA、cuDNN、MKL-DNN 和 AWS Elastic Inference。

It provides preinstalled Conda environment for both Python 2 and Python 3 with Apache MXNet, CUDA, cuDNN, MKL-DNN, and AWS Elastic Inference.

Dynamic Training on AWS

它对实验手动 EC2 设置以及半自动化 CloudFormation 设置提供培训。

It provides the training for experimental manual EC2 setup as well as for semi-automated CloudFormation setup.

您可以使用 Amazon Web Services 中提供的 NVIDIA VM (可在 https://aws.amazon.com 获得)。

You can use NVIDIA VM available at https://aws.amazon.com with Amazon web services.

Google Cloud Platform

Google 还提供 NVIDIA GPU cloud image ，可在 https://console.cloud.google.com 获得，可用于处理 Apache MXNet。

Google is also providing NVIDIA GPU cloud image which is available at https://console.cloud.google.com to work with Apache MXNet.

Microsoft Azure

Microsoft Azure Marketplace 还提供 NVIDIA GPU cloud image ，可在 https://azuremarketplace.microsoft.com 获得，可用于处理 Apache MXNet。

Microsoft Azure Marketplace is also providing NVIDIA GPU cloud image available at https://azuremarketplace.microsoft.com to work with Apache MXNet.

Oracle Cloud

Oracle 还提供 NVIDIA GPU cloud image ，可在 https://docs.cloud.oracle.com 获得，可用于处理 Apache MXNet。

Oracle is also providing NVIDIA GPU cloud image available at https://docs.cloud.oracle.com to work with Apache MXNet.

Central Processing Unit (CPU)

Apache MXNet 可在每个云提供程序的仅限 CPU 的实例上运行。有各种安装方法，例如 −

Apache MXNet works on every cloud provider’s CPU-only instance. There are various methods to install such as−

Python pip install instructions.
Docker instructions.
Preinstalled option like Amazon Web Services which provides AWS Deep Learning AMI (having preinstalled Conda environment for both Python 2 and Python 3 with MXNet and MKL-DNN).

Installing MXNet on Devices

让我们了解如何在设备上安装 MXNet。

Let us learn how to install MXNet on devices.

Raspberry Pi

您也可以在 Raspberry Pi 3B 设备上运行 Apache MXNet，因为 MXNet 也支持基于 Respbian ARM 的操作系统。为了在 Raspberry Pi3 上平稳运行 MXNet，建议使用具有 1 GB 以上内存和至少 4GB 可用空间的 SD 卡的设备。

You can also run Apache MXNet on Raspberry Pi 3B devices as MXNet also support Respbian ARM based OS. In order to run MXNet smoothly on the Raspberry Pi3, it is recommended to have a device that has more than 1 GB of RAM and a SD card with at least 4GB of free space.

以下是利用这些方法为 Raspberry Pi 构建 MXNet 并安装该库的 Python 绑定：

Following are the ways with the help of which you can build MXNet for the Raspberry Pi and install the Python bindings for the library as well−

Quick installation

预先构建的 Python wheel 可用于安装在带有 Stretch 的 Raspberry Pi 3B 上以便快速安装。此方法的一个重要问题是我们可能需要安装若干个依赖项才能让 Apache MXNet 运行。

The pre-built Python wheel can be used on a Raspberry Pi 3B with Stretch for quick installation. One of the important issues with this method is that, we need to install several dependencies to get Apache MXNet to work.

Docker installation

您可以按照 https://docs.docker.com/engine/install/ubuntu/ 中提供的 docker 安装说明在您的机器上安装 Docker。为此，我们也可以安装和使用社区版 (CE)。

You can follow the docker installation instructions, which is available at https://docs.docker.com/engine/install/ubuntu/ to install Docker on your machine. For this purpose, we can install and use Community Edition (CE) also.

Native Build (from source)

为了从源安装 MXNet，我们需要按照以下两个步骤操作：

In order to install MXNet from source, we need to follow the following two steps−

Step 1

Build the shared library from the Apache MXNet C++ source code

为了在 Raspberry 版本 Wheezy 及更高版本上构建共享库，我们需要以下依赖项：

To build the shared library on Raspberry version Wheezy and later, we need the following dependencies:

Git− It is required to pull code from GitHub.
Libblas− It is required for linear algebraic operations.
Libopencv− It is required for computer vision related operations. However, it is optional if you would like to save your RAM and Disk Space.
C Compiler− It is required to compiles and builds MXNet source code. Following are the supported compilers that supports C 11− G++ (4.8 or later version) Clang(3.9-6)

使用以下命令安装上述依赖项：

Use the following commands to install the above-mentioned dependencies−

sudo apt-get update
sudo apt-get -y install git cmake ninja-build build-essential g++-4.9 c++-4.9 liblapack*
libblas* libopencv*
libopenblas* python3-dev python-dev virtualenv

接下来，我们需要克隆 MXNet 源代码存储库。为此，请在您的主目录中使用以下 git 命令：

Next, we need to clone the MXNet source code repository. For this use the following git command in your home directory−

git clone https://github.com/apache/incubator-mxnet.git --recursive

cd incubator-mxnet

现在，利用以下命令构建共享库：

Now, with the help of following commands, build the shared library:

mkdir -p build && cd build
cmake \
-DUSE_SSE=OFF \
-DUSE_CUDA=OFF \
-DUSE_OPENCV=ON \
-DUSE_OPENMP=ON \
-DUSE_MKL_IF_AVAILABLE=OFF \
-DUSE_SIGNAL_HANDLER=ON \

-DCMAKE_BUILD_TYPE=Release \
-GNinja ..
ninja -j$(nproc)

一旦您执行了上述命令，它将启动构建过程，该过程将需要几个小时才能完成。您将在构建目录中得到一个名为 libmxnet.so 的文件。

Once you execute the above commands, it will start the build process which will take couple of hours to finish. You will get a file named libmxnet.so in the build directory.

Step 2

Install the supported language-specific packages for Apache MXNet

在此步骤中，我们将安装 MXNet Pythin 绑定。为此，我们需要在 MXNet 目录中运行以下命令：

In this step, we will install MXNet Pythin bindings. To do so, we need to run the following command in the MXNet directory−

cd python
pip install --upgrade pip
pip install -e .

或者，通过以下命令，您也可以创建一个可使用 pip 安装的 whl package ：

Alternatively, with the following command, you can also create a whl package installable with pip−

ci/docker/runtime_functions.sh build_wheel python/ $(realpath build)

NVIDIA Jetson Devices

您也可以在 NVIDIA Jetson 设备上（如 TX2 或 Nano ）运行 Apache MXNet，因为 MXNet 也支持基于 Ubuntu Arch64 的操作系统。为了在 NVIDIA Jetson 设备上平稳运行 MXNet，在您的 Jetson 设备上安装 CUDA 是必要的。

You can also run Apache MXNet on NVIDIA Jetson Devices, such as TX2 or Nano as MXNet also support the Ubuntu Arch64 based OS. In order to run, MXNet smoothly on the NVIDIA Jetson Devices, it is necessary to have CUDA installed on your Jetson device.

以下是借助其可以为 NVIDIA Jetson 设备构建 MXNet 的方式：

Following are the ways with the help of which you can build MXNet for NVIDIA Jetson devices:

By using a Jetson MXNet pip wheel for Python development
From source

但是，在通过上述任何一种方式构建 MXNet 之前，您需要在 Jetson 设备上安装以下依赖项−

But, before building MXNet from any of the above-mentioned ways, you need to install following dependencies on your Jetson devices−

Python Dependencies

为了使用 Python API，我们需要以下依赖项−

In order to use the Python API, we need the following dependencies−

sudo apt update
sudo apt -y install \
   build-essential \
   git \
   graphviz \
   libatlas-base-dev \
   libopencv-dev \
   python-pip
sudo pip install --upgrade \
   pip \
   setuptools
sudo pip install \
   graphviz==0.8.4 \
   jupyter \
   numpy==1.15.2

Clone the MXNet source code repository

通过在主目录中使用以下 git 命令克隆 MXNet 源代码存储库−

By using the following git command in your home directory, clone the MXNet source code repository−

git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet

Setup environment variables

在主目录中的 .profile 文件中添加以下内容−

Add the following in your .profile file in your home directory−

export PATH=/usr/local/cuda/bin:$PATH
export MXNET_HOME=$HOME/mxnet/
export PYTHONPATH=$MXNET_HOME/python:$PYTHONPATH

现在，使用以下命令立即应用更改−

Now, apply the change immediately with the following command−

source .profile

Configure CUDA

在使用 nvcc 配置 CUDA 之前，您需要检查正在运行哪个版本的 CUDA −

Before configuring CUDA, with nvcc, you need to check what version of CUDA is running−

nvcc --version

假设您的设备或计算机上安装了多个 CUDA 版本，并且您希望切换 CUDA 版本，那么请使用以下并将其替换为您想要的版本的符号链接−

Suppose, if more than one CUDA version is installed on your device or computer and you want to switch CUDA versions then, use the following and replace the symbolic link to the version you want−

sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

上述命令将切换到 CUDA 10.0，该版本预装在 NVIDIA Jetson 设备 Nano 上。

The above command will switch to CUDA 10.0, which is preinstalled on NVIDIA Jetson device Nano.

完成上述先决条件后，您现在可以在 NVIDIA Jetson 设备上安装 MXNet。因此，让我们了解借助其可以安装 MXNet 的方式−

Once you done with the above-mentioned prerequisites, you can now install MXNet on NVIDIA Jetson Devices. So, let us understand the ways with the help of which you can install MXNet−

By using a Jetson MXNet pip wheel for Python development −如果您想使用已准备好的 Python 轮子，那么请将以下内容下载到 Jetson 并运行−

By using a Jetson MXNet pip wheel for Python development− If you want to use a prepared Python wheel then download the following to your Jetson and run it−

MXNet 1.4.0 (for Python 3) available at https://docs.docker.com
MXNet 1.4.0 (for Python 2) available at https://docs.docker.com

Native Build (from source)

为了从源安装 MXNet，我们需要按照以下两个步骤操作：

In order to install MXNet from source, we need to follow the following two steps−

Step 1

Build the shared library from the Apache MXNet C++ source code

若要从 Apache MXNet C++ 源代码构建共享库，您可以使用 Docker 方法或手动构建−

To build the shared library from the Apache MXNet C++ source code, you can either use Docker method or do it manually−

Docker method

此方法中，您首先需要安装 Docker 并能够在不使用 sudo 的情况下运行它（在前面的步骤中也有说明）。完成后，运行以下内容以通过 Docker 执行交叉编译−

In this method, you first need to install Docker and able to run it without sudo (which is also explained in previous steps). Once done, run the following to execute cross-compilation via Docker−

$MXNET_HOME/ci/build.py -p jetson

Manual

此方法中，您需要编辑 Makefile （使用以下命令）以使用 CUDA 绑定安装 MXNet，以利用 NVIDIA Jetson 设备上的图形处理单元 (GPU)：

In this method, you need to edit the Makefile (with below command) to install the MXNet with CUDA bindings to leverage the Graphical Processing units (GPU) on NVIDIA Jetson devices:

cp $MXNET_HOME/make/crosscompile.jetson.mk config.mk

编辑 Makefile 之后，您需要编辑 config.mk 文件，以便对 NVIDIA Jetson 设备进行一些额外更改。

After editing the Makefile, you need to edit config.mk file to make some additional changes for the NVIDIA Jetson device.

为此，请更新以下设置−

For this, update the following settings−

Update the CUDA path: USE_CUDA_PATH = /usr/local/cuda
Add -gencode arch=compute-63, code=sm_62 to the CUDA_ARCH setting.
Update the NVCC settings: NVCCFLAGS := -m64
Turn on OpenCV: USE_OPENCV = 1

现在为了确保 MXNet 以 Pascal 的硬件级别低精度加速构建，我们需要编辑 Mshadow Makefile，如下所示：

Now to ensure that the MXNet builds with Pascal’s hardware level low precision acceleration, we need to edit the Mshadow Makefile as follow−

MSHADOW_CFLAGS += -DMSHADOW_USE_PASCAL=1

最后，在以下命令的帮助下，您可以构建完整的 Apache MXNet 库：

Finally, with the help of following command you can build the complete Apache MXNet library−

cd $MXNET_HOME
make -j $(nproc)

一旦你执行完上述命令，它就会开始构建过程，这将需要几个小时才能完成。您将在 mxnet/lib directory 中获得一个名为 libmxnet.so 的文件。

Once you execute the above commands, it will start the build process which will take couple of hours to finish. You will get a file named libmxnet.so in the mxnet/lib directory.

Step 2

Install the Apache MXNet Python Bindings

在这一步中，我们将安装 MXNet Python 绑定。为此，我们需要在 MXNet 目录中运行以下命令：

In this step, we will install MXNet Python bindings. To do so we need to run the following command in the MXNet directory−

cd $MXNET_HOME/python
sudo pip install -e .

在完成上述步骤后，你现在可以准备在你的 NVIDIA Jetson 设备 TX2 或 Nano 上运行 MXNet。可以使用以下命令进行验证：

Once done with above steps, you are now ready to run MXNet on your NVIDIA Jetson devices TX2 or Nano. It can be verified with the following command−

import mxnet
mxnet.__version__

如果一切都正常工作，它将返回版本号。

It will return the version number if everything is properly working.

Apache MXNet - Toolkits and Ecosystem

为了支持跨多个领域的深度学习应用程序的研发，Apache MXNet 为我们提供了一个丰富的工具包、库等生态系统。让我们探索它们--

To support the research and development of Deep Learning applications across many fields, Apache MXNet provides us a rich ecosystem of toolkits, libraries and many more. Let us explore them −

ToolKits

以下是 MXNet 提供的部分最常用且最重要的工具包--

Following are some of the most used and important toolkits provided by MXNet −

GluonCV

顾名思义，GluonCV 是一个由 MXNet 驱动的 Gluon 工具包，用于计算机视觉。它提供了计算机视觉 (CV) 中最先进的 DL（深度学习）算法的实现。在 GluonCV 工具包的帮助下，工程师、研究人员和学生们可以验证新想法并轻松学习 CV。

As name implies GluonCV is a Gluon toolkit for computer vision powered by MXNet. It provides implementation of state-of-the-art DL (Deep Learning) algorithms in computer vision (CV). With the help of GluonCV toolkit engineers, researchers, and students can validate new ideas and learn CV easily.

下面给出了一些 features of GluonCV —

Given below are some of the features of GluonCV −

It trains scripts for reproducing state-of-the-art results reported in latest research.
More than 170+ high quality pretrained models.
Embrace flexible development pattern.
GluonCV is easy to optimize. We can deploy it without retaining heavy weight DL framework.
It provides carefully designed APIs that greatly lessen the implementation intricacy.
Community support.
Easy to understand implementations.

以下 supported applications 基于 GluonCV 工具包：

Following are the supported applications by GluonCV toolkit:

Image Classification
Object Detection
Semantic Segmentation
Instance Segmentation
Pose Estimation
Video Action Recognition

我们可以使用 pip 安装 GluonCV，如下所示 -

We can install GluonCV by using pip as follows −

pip install --upgrade mxnet gluoncv

GluonNLP

顾名思义，GluonNLP 是由 MXNet 驱动的用于自然语言处理 (NLP) 的 Gluon 工具包。它提供了 NLP 领域中先进的 DL（深度学习）模型实现。

As name implies GluonNLP is a Gluon toolkit for Natural Language Processing (NLP) powered by MXNet. It provides implementation of state-of-the-art DL (Deep Learning) models in NLP.

借助 GluonNLP 工具包，工程师、研究人员和学生可以构建文本数据管道和模型。基于这些模型，他们可以快速构建研究理念和产品原型。

With the help of GluonNLP toolkit engineers, researchers, and students can build blocks for text data pipelines and models. Based on these models, they can quickly prototype the research ideas and product.

以下是 GluonNLP 的一些特性：

Given below are some of the features of GluonNLP:

It trains scripts for reproducing state-of-the-art results reported in latest research.
Set of pretrained models for common NLP tasks.
It provides carefully designed APIs that greatly lessen the implementation intricacy.
Community support.
It also provides tutorials to help you get started on new NLP tasks.

以下是我们使用 GluonNLP 工具包可以实现的 NLP 任务 -

Following are the NLP tasks we can implement with GluonNLP toolkit −

Word Embedding
Language Model
Machine Translation
Text Classification
Sentiment Analysis
Natural Language Inference
Text Generation
Dependency Parsing
Named Entity Recognition
Intent Classification and Slot Labeling

我们可以使用 pip 安装 GluonNLP，如下所示 -

We can install GluonNLP by using pip as follows −

pip install --upgrade mxnet gluonnlp

GluonTS

顾名思义，GluonTS 是由 MXNet 驱动的用于概率时间序列建模的 Gluon 工具包。

As name implies GluonTS is a Gluon toolkit for Probabilistic Time Series Modeling powered by MXNet.

它提供以下特性 -

It provides the following features −

State-of-the-art (SOTA) deep learning models ready to be trained.
The utilities for loading as well as iterating over time-series datasets.
Building blocks to define your own model.

借助 GluonTS 工具包，工程师、研究人员和学生可以在自己的数据上训练和评估任何内置模型，快速尝试不同的解决方案，并针对其时间序列任务提出解决方案。

With the help of GluonTS toolkit engineers, researchers, and students can train and evaluate any of the built-in models on their own data, quickly experiment with different solutions, and come up with a solution for their time series tasks.

他们还可以使用所提供的抽象和构建模块创建自定义时间序列模型，并针对基线算法快速对其进行基准测试。

They can also use the provided abstractions and building blocks to create custom time series models, and rapidly benchmark them against baseline algorithms.

我们可以使用 pip 安装 GluonTS，如下所示 -

We can install GluonTS by using pip as follows −

pip install gluonts

GluonFR

顾名思义，它是一个用于 FR（人脸识别）的 Apache MXNet Gluon 工具包。它提供了以下特性 -

As name implies, it is an Apache MXNet Gluon toolkit for FR (Face Recognition). It provides the following features −

State-of-the-art (SOTA) deep learning models in face recognition.
The implementation of SoftmaxCrossEntropyLoss, ArcLoss, TripletLoss, RingLoss, CosLoss/AMsoftmax, L2-Softmax, A-Softmax, CenterLoss, ContrastiveLoss, and LGM Loss, etc.

要安装 gluon face，则需要 Python 3.5 或更高版本。我们还需要先安装 GluonCV 和 MXNet，如下所示：

In order to install Gluon Face, we need Python 3.5 or later. We also first need to install GluonCV and MXNet first as follows −

pip install gluoncv --pre
pip install mxnet-mkl --pre --upgrade
pip install mxnet-cuXXmkl --pre –upgrade # if cuda XX is installed

安装依赖项后，可以使用以下命令安装 GluonFR：

Once you installed the dependencies, you can use the following command to install GluonFR −

From Source

pip install git+https://github.com/THUFutureLab/gluon-face.git@master

Pip

pip install gluonfr

Ecosystem

现在，让我们探索 MXNet 丰富的库、包和框架：

Now let us explore MXNet’s rich libraries, packages, and frameworks −

Coach RL

Intel AI 实验室创建的 Python 强化学习 (RL) 框架 Coach。它支持轻松试验 State-of-the-art RL 算法。Coach RL 支持 Apache MXNet 作为后端，并允许轻松集成新的环境进行求解。

Coach, a Python Reinforcement Learning (RL) framework created by Intel AI lab. It enables easy experimentation with State-of-the-art RL algorithms. Coach RL supports Apache MXNet as a back end and allows simple integration of new environment to solve.

为了轻松扩展和重复利用现有组件，Coach RL 很好的解耦了基本的强化学习组件，如算法、环境、神经网络架构、探索策略。

In order to extend and reuse existing components easily, Coach RL very well decoupled the basic reinforcement learning components such as algorithms, environments, NN architectures, exploration policies.

以下面向 Coach RL 框架的智能体和支持算法：

Following are the agents and supported algorithms for Coach RL framework −

Value Optimization Agents

Deep Q Network (DQN)
Double Deep Q Network (DDQN)
Dueling Q Network
Mixed Monte Carlo (MMC)
Persistent Advantage Learning (PAL)
Categorical Deep Q Network (C51)
Quantile Regression Deep Q Network (QR-DQN)
N-Step Q Learning
Neural Episodic Control (NEC)
Normalized Advantage Functions (NAF)
Rainbow

Policy Optimization Agents

Policy Gradients (PG)
Asynchronous Advantage Actor-Critic (A3C)
Deep Deterministic Policy Gradients (DDPG)
Proximal Policy Optimization (PPO)
Clipped Proximal Policy Optimization (CPPO)
Generalized Advantage Estimation (GAE)
Sample Efficient Actor-Critic with Experience Replay (ACER)
Soft Actor-Critic (SAC)
Twin Delayed Deep Deterministic Policy Gradient (TD3)

General Agents

Direct Future Prediction (DFP)

Imitation Learning Agents

Behavioral Cloning (BC)
Conditional Imitation Learning

Hierarchical Reinforcement Learning Agents

Hierarchical Actor Critic (HAC)

Deep Graph Library

Deep Graph Library (DGL) 由纽约大学和亚马逊网络服务团队（上海）开发，这是一个可在 MXNet 之上提供图神经网络 (GNN) 轻松实现的 Python 包。它还可在 PyTorch、Gluon 等其他现有主流深度学习库之上提供 GNN 的轻松实现。

Deep Graph Library (DGL), developed by NYU and AWS teams, Shanghai, is a Python package that provides easy implementations of Graph Neural Networks (GNNs) on top of MXNet. It also provides easy implementation of GNNs on top of other existing major deep learning libraries like PyTorch, Gluon, etc.

Deep Graph Library 是一款免费软件。它在 Ubuntu 16.04、macOS X 以及 Windows 7 或更高版本的系统上均可用。它还需要 Python 3.5 或更高版本。

Deep Graph Library is a free software. It is available on all Linux distributions later than Ubuntu 16.04, macOS X, and Windows 7 or later. It also requires the Python 3.5 version or later.

以下是 DGL 的功能 −

Following are the features of DGL −

No Migration cost − DGL 建立在流行的现有 DL 框架之上，因此使用它没有迁移成本。

No Migration cost − There is no migration cost for using DGL as it is built on top of popular exiting DL frameworks.

Message Passing − DGL 提供消息传递并对其具有多功能控制。消息传递的范围从低级操作（例如沿着选定的边发送）到高级控制（例如图级特征更新）。

Message Passing − DGL provides message passing and it has versatile control over it. The message passing ranges from low-level operations such as sending along selected edges to high-level control such as graph-wide feature updates.

Smooth Learning Curve − 由于强大的用户自定义函数既灵活又易于使用，因此学习和使用 DGL 非常容易。

Smooth Learning Curve − It is quite easy to learn and use DGL as the powerful user-defined functions are flexible as well as easy to use.

Transparent Speed Optimization − DGL 通过自动批处理计算和稀疏矩阵乘法提供透明的速度优化。

Transparent Speed Optimization − DGL provides transparent speed optimization by doing automatic batching of computations and sparse matrix multiplication.

High performance − 为了达到最大的效率，DGL 自动将 DNN（深度神经网络）训练批处理在一起或多个图。

High performance − In order to achieve maximum efficiency, DGL automatically batches DNN (deep neural networks) training on one or many graphs together.

Easy & friendly interface − DGL 为我们提供了用于边缘特征访问以及图结构操作的简单友好的界面。

Easy & friendly interface − DGL provides us easy & friendly interfaces for edge feature access as well as graph structure manipulation.

InsightFace

InsightFace，一个用于面部分析的深度学习工具包，它提供了由 MXNet 提供支持的计算机视觉中 SOTA（最先进）面部分析算法的实现。它提供 −

InsightFace, a Deep Learning Toolkit for Face Analysis that provides implementation of SOTA (state-of-the-art) face analysis algorithm in computer vision powered by MXNet. It provides −

High-quality large set of pre-trained models.
State-of-the-art (SOTA) training scripts.
InsightFace is easy to optimize. We can deploy it without retaining heavy weight DL framework.
It provides carefully designed APIs that greatly lessen the implementation intricacy.
Building blocks to define your own model.

我们可以使用 pip 安装 InsightFace 如下 −

We can install InsightFace by using pip as follows −

pip install --upgrade insightface

请注意，在安装 InsightFace 之前，请根据您的系统配置安装正确的 MXNet 软件包。

Please note that before installing InsightFace, please install the correct MXNet package according to your system configuration.

Keras-MXNet

众所周知，Keras 是用 Python 编写的用于神经网络（NN）的高级 API，Keras-MXNet 为 Keras 提供后端支持。它可以在高性能和可扩展的 Apache MXNet DL 框架之上运行。

As we know that Keras is a high-level Neural Network (NN) API written in Python, Keras-MXNet provides us a backend support for the Keras. It can run on top of high performance and scalable Apache MXNet DL framework.

Keras-MXNet 的功能如下所述 −

The features of Keras-MXNet are mentioned below −

Allows users for easy, smooth, and fast prototyping. It all happens through user friendliness, modularity, and extensibility.
Supports both CNN (Convolutional Neural Networks) and RNN (Recurrent Neural Networks) as well as the combination of both also.
Runs flawlessly on both Central Processing Unit (CPU) and Graphical Processing Unit (GPU).
Can run on one or multi GPU.

为了使用此后端，您首先需要按如下方式安装 keras-mxnet −

In order to work with this backend, you first need to install keras-mxnet as follows −

pip install keras-mxnet

现在，如果您使用 GPU，则按以下方式安装并支持 CUDA 9 的 MXNet。

Now, if you are using GPUs then install MXNet with CUDA 9 support as follows −

pip install mxnet-cu90

但如果您只使用 CPU，则按以下方式安装基本 MXNet。

But if you are using CPU-only then install basic MXNet as follows −

pip install mxnet

MXBoard

MXBoard 是一个用 Python 编写的记录工具，用于记录 MXNet 数据帧并在 TensorBoard 中显示。换句话说，MXBoard 意在遵循 tensorboard-pytorch API。它支持 TensorBoard 中的大多数数据类型。

MXBoard is logging tool, written in Python, that is used to record MXNet data frames and display in TensorBoard. In other words, the MXBoard is meant to follow the tensorboard-pytorch API. It supports most of the data types in TensorBoard.

下面提到了一些数据类型：

Some of them are mentioned below −

Graph
Scalar
Histogram
Embedding
Image
Text
Audio
Precision-Recall Curve

MXFusion

MXFusion 是一个模块化概率编程库，具有深度学习功能。MXFusion 允许我们充分利用模块化，这是概率编程库的一个关键特性。它易于使用，并为用户提供了一个便捷的界面，用于设计概率模型并将其应用于实际问题。

MXFusion is a modular probabilistic programming library with deep learning. MXFusion allows us to fully exploited modularity, which is a key feature of deep learning libraries, for probabilistic programming. It is simple to use and provides the users a convenient interface for designing probabilistic models and applying them to the real-world problems.

MXFusion 已在 Python 版本 3.4 及更高版本、MacOS 和 Linux 操作系统上验证。为了安装 MXFusion，我们首先需要安装以下依赖项：

MXFusion is verified on Python version 3.4 and more on MacOS and Linux OS. In order to install MXFusion, we need to first install the following dependencies −

MXNet >= 1.3
Networkx >= 2.1

借助以下 pip 命令，您可以安装 MXFusion：

With the help of following pip command, you can install MXFusion −

pip install mxfusion

TVM

Apache TVM，一个面向 CPU、GPU 和专用加速器等硬件后端的开源端到端深度学习编译堆栈，旨在填补以生产力为中心的深度学习框架与以性能为导向的硬件后端之间的差距。随着最新的 MXNet 1.6.0 版本，用户可以利用 Apache（孵化中）TVM 用 Python 编程语言实现高性能算子内核。

Apache TVM 最初实际上是华盛顿大学 Paul G. Allen 计算机科学与工程学院 SAMPL 小组的一个研究项目，现在它正在 Apache 软件基金会（ASF）孵化，由一个 OSC（开源社区）推动，该社区涉及多个行业以及 Apache 方式下的学术机构。

Apache TVM actually started as a research project at the SAMPL group of Paul G. Allen School of Computer Science & Engineering, University of Washington and now it is an effort undergoing incubation at The Apache Software Foundation (ASF) which is driven by an OSC (open source community) that involves multiple industry as well as academic institutions under the Apache way.

以下是 Apache（孵化中）TVM 的主要特性：

Following are the main features of Apache(incubating) TVM −

Simplifies the former C++ based development process.
Enables sharing the same implementation across multiple hardware backends such as CPUs, GPUs, etc.
TVM provides compilation of DL models in various frameworks such as Kears, MXNet, PyTorch, Tensorflow, CoreML, DarkNet into minimum deployable modules on diverse hardware backends.
It also provides us the infrastructure to automatically generate and optimize tensor operators with better performance.

XFer

Xfer，一个迁移学习框架，是用 Python 编写的。它基本上采用 MXNet 模型，并且也训练元模型或修改模型以适应新的目标数据集。

Xfer, a transfer learning framework, is written in Python. It basically takes an MXNet model and train a meta-model or modifies the model for a new target dataset as well.

简单来说，Xfer 是一个 Python 库，允许用户快速轻松地传输存储在 DNN（深度神经网络）中的知识。

In simple words, Xfer is a Python library that allows users to quick and easy transfer of knowledge stored in DNN (deep neural networks).

Xfer 可用于：

Xfer can be used −

For the classification of data of arbitrary numeric format.
To the common cases of images or text data.
As a pipeline that spams from extracting features to training a repurposer (an object that performs classification in the target task).

下面是 Xfer 的功能：

Following are the features of Xfer:

Resource efficiency
Data efficiency
Easy access to neural networks
Uncertainty modeling
Rapid prototyping
Utilities for feature extraction from NN

Apache MXNet - System Architecture

本章节将帮助您了解 MXNet 系统架构。让我们从学习 MXNet 模块开始。

This chapter will help you in understanding about the MXNet system architecture. Let us begin by learning about the MXNet Modules.

MXNet Modules

下图是 MXNet 系统架构图，它显示了 MXNet modules and their interaction 的主要模块和组件。

The diagram below is the MXNet system architecture and it shows the major modules and components of MXNet modules and their interaction.

在上图中 −

In the above diagram −

The modules in blue color boxes are User Facing Modules.
The modules in green color boxes are System Modules.
Solid arrow represents high dependency, i.e. heavily rely on the interface.
Dotted arrow represents light dependency, i.e. Used data structure for convenience and interface consistency. In fact, it can be replaced by the alternatives.

让我们进一步讨论面向用户和系统模块。

Let us discuss more about user facing and system modules.

User-facing Modules

面向用户的模块如下 −

The user-facing modules are as follows −

NDArray − It provides flexible imperative programs for Apache MXNet. They are dynamic and asynchronous n-dimensional arrays.
KVStore − It acts as interface for efficient parameter synchronization. In KVStore, KV stands for Key-Value. So, it a key-value store interface.
Data Loading (IO) − This user facing module is used for efficient distributed data loading and augmentation.
Symbol Execution − It is a static symbolic graph executor. It provides efficient symbolic graph execution and optimization.
Symbol Construction − This user facing module provides user a way to construct a computation graph i.e. net configuration.

System Modules

系统模块如下 −

The system modules are as follows −

Storage Allocator − This system module, as name suggests, allocates and recycle memory blocks efficiently on host i.e. CPU and different devices i.e. GPUs.
Runtime Dependency Engine − Runtime dependency engine module schedules as well as executes the operations as per their read/write dependency.
Resource Manager − Resource Manager (RM) system module manages global resources like the random number generator and temporal space.
Operator − Operator system module consists of all the operators that define static forward and gradient calculation i.e. backpropagation.

Apache MXNet - System Components

此处详细介绍了 Apache MXNet 中的系统组件。首先，我们将研究 MXNet 中的执行引擎。

Here, the system components in Apache MXNet are explained in detail. First, we will study about the execution engine in MXNet.

Execution Engine

Apache MXNet 的执行引擎非常通用。它可以用深度学习或任何特定领域问题：执行一些函数，同时遵循这些函数的依赖关系。它的设计方式使依赖函数序列化，而没有依赖关系的函数可以并行执行。

Apache MXNet’s execution engine is very versatile. We can use it for deep learning as well as any domain-specific problem: execute a bunch of functions following their dependencies. It is designed in such a way that the functions with dependencies are serialized whereas, the functions with no dependencies can be executed in parallel.

Core Interface

下面给出的 API 是 Apache MXNet 执行引擎的核心接口：

The API given below is the core interface for Apache MXNet’s execution engine −

virtual void PushSync(Fn exec_fun, Context exec_ctx,
std::vector<VarHandle> const& const_vars,
std::vector<VarHandle> const& mutate_vars) = 0;

以上 API 具有以下内容：

The above API has the following −

exec_fun − The core interface API of MXNet allows us to push the function named exec_fun, along with its context information and dependencies, to the execution engine.
exec_ctx − The context information in which the above-mentioned function exec_fun should be executed.
const_vars − These are the variables that the function reads from.
mutate_vars − These are the variables that are to be modified.

执行引擎向其用户保证以按顺序推入的方式对修改通用变量的两个函数的执行进行序列化。

The execution engine provides its user the guarantee that the execution of any two functions that modify a common variable is serialized in their push order.

Function

以下为 Apache MXNet 执行引擎的函数类型：

Following is the function type of the execution engine of Apache MXNet −

using Fn = std::function<void(RunContext)>;

在上述函数中， RunContext 包含运行时信息。运行时信息应由执行引擎来确定。 RunContext 的句法如下：

In the above function, RunContext contains the runtime information. The runtime information should be determined by the execution engine. The syntax of RunContext is as follows−

struct RunContext {
   // stream pointer which could be safely cast to
   // cudaStream_t* type
   void *stream;
};

以下列出有关执行引擎函数的一些重要提示：

Below are given some important points about execution engine’s functions −

All the functions are executed by MXNet’s execution engine’s internal threads.
It is not good to push blocking the function to the execution engine because with that the function will occupy the execution thread and will also reduce the total throughput.

为此，MXNet 提供了另一个异步函数，如下所示：

For this MXNet provides another asynchronous function as follows−

using Callback = std::function<void()>;
using AsyncFn = std::function<void(RunContext, Callback)>;

In this AsyncFn function we can pass the heavy part of our threads, but the execution engine does not consider the function finished until we call the callback function.

Context

在 Context 中，我们可以指定在其中执行函数的上下文。这通常包括以下内容：

In Context, we can specify the context of the function to be executed within. This usually includes the following −

Whether the function should be run on a CPU or a GPU.
If we specify GPU in the Context, then which GPU to use.
There is a huge difference between Context and RunContext. Context have the device type and device id, whereas RunContext have the information that can be decided only during runtime.

VarHandle

用于指定函数依赖关系的 VarHandle 就像一个标记（特别是由执行引擎提供的），它可用于表示函数可以修改或使用的外部资源。

VarHandle, used to specify the dependencies of functions, is like a token (especially provided by execution engine) we can use to represents the external resources the function can modify or use.

但出现了问题，为什么我们需要使用 VarHandle？这是因为，Apache MXNet 引擎被设计为与其他 MXNet 模块分离。

But the question arises, why we need to use VarHandle? It is because, the Apache MXNet engine is designed to decoupled from other MXNet modules.

以下是有关 VarHandle 的一些重要要点：

Following are some important points about VarHandle −

It is lightweight so to create, delete, or copying a variable incurs little operating cost.
We need to specify the immutable variables i.e. the variables that will be used in the const_vars.
We need to specify the mutable variables i.e. the variables that will be modified in the mutate_vars.
The rule used by the execution engine to resolve the dependencies among functions is that the execution of any two functions when one of them modifies at least one common variable is serialized in their push order.
For creating a new variable, we can use the NewVar() API.
For deleting a variable, we can use the PushDelete API.

让我们通过一个简单的示例了解它的工作原理：

Let us understand its working with a simple example −

假设我们有两个函数，分别称为 F1 和 F2，并且它们都更改了变量 V2。在这种情况下，如果 F2 在 F1 之后被推送，则保证 F2 在 F1 之后执行。另一方面，如果 F1 和 F2 都使用 V2，则它们实际的执行顺序可能是随机的。

Suppose if we have two functions namely F1 and F2 and they both mutate the variable namely V2. In that case, F2 is guaranteed to be executed after F1 if F2 is pushed after F1. On the other side, if F1 and F2 both use V2 then their actual execution order could be random.

Push and Wait

Push 和 wait 是执行引擎中另外两个有用的 API。

Push and wait are two more useful API of execution engine.

以下是 Push API 的两个重要特性：

Following are two important features of Push API:

All the Push APIs are asynchronous which means that the API call immediately returns regardless of whether the pushed function is finished or not.
Push API is not thread safe which means that only one thread should make engine API calls at a time.

现在如果我们讨论 Wait API，以下几点代表它 −

Now if we talk about Wait API, following points represent it −

If a user wants to wait for a specific function to be finished, he/she should include a callback function in the closure. Once included, call the function at the end of the function.
On the other hand, if a user wants to wait for all functions that involves a certain variable to finish, he/she should use WaitForVar(var) API.
If someone wants to wait for all the pushed functions to finish, then use the WaitForAll () API.
Used to specify the dependencies of functions, is like a token.

Operators

Apache MXNet 中的运算符是一个包含实际计算逻辑以及辅助信息，并帮助系统执行优化的类。

Operator in Apache MXNet is a class that contains actual computation logic as well as auxiliary information and aid the system in performing optimisation.

Operator Interface

Forward 是核心运算符接口，其语法如下：

Forward is the core operator interface whose syntax is as follows:

virtual void Forward(const OpContext &ctx,
const std::vector<TBlob> &in_data,
const std::vector<OpReqType> &req,
const std::vector<TBlob> &out_data,
const std::vector<TBlob> &aux_states) = 0;

定义在 Forward() 中的 OpContext 的结构如下：

The structure of OpContext, defined in Forward() is as follows:

struct OpContext {
   int is_train;
   RunContext run_ctx;
   std::vector<Resource> requested;
}

OpContext 描述了运算符的状态（是否在训练或测试阶段），运算符应该在哪个设备上运行，以及请求的资源。执行引擎的两个更有用的 API。

The OpContext describes the state of operator (whether in the train or test phase), which device the operator should be run on and also the requested resources. two more useful API of execution engine.

从上述 Forward 的核心接口，我们可以理解请求的资源如下 −

From the above Forward core interface, we can understand the requested resources as follows −

in_data and out_data represent the input and output tensors.
req denotes how the result of computation are written into the out_data.

OpReqType 可以定义为 −

The OpReqType can be defined as −

enum OpReqType {
   kNullOp,
   kWriteTo,
   kWriteInplace,
   kAddTo
};

就像 Forward 运算符一样，我们可以选择实现 Backward 接口，如下所示：

As like Forward operator, we can optionally implement the Backward interface as follows −

virtual void Backward(const OpContext &ctx,
const std::vector<TBlob> &out_grad,
const std::vector<TBlob> &in_data,
const std::vector<TBlob> &out_data,
const std::vector<OpReqType> &req,
const std::vector<TBlob> &in_grad,
const std::vector<TBlob> &aux_states);

Various tasks

Operator 接口允许用户执行以下任务 −

Operator interface allows the users to do the following tasks −

User can specify in-place updates and can reduce memory allocation cost
In order to make it cleaner, the user can hide some internal arguments from Python.
User can define the relationship among the tensors and output tensors.
To perform computation, the user can acquire additional temporary space from the system.

Operator Property

我们知道在卷积神经网络 (CNN) 中，一个卷积有多种实现。为了从中获得最佳性能，我们可能希望在这些卷积之中进行切换。

As we are aware that in Convolutional neural network (CNN), one convolution has several implementations. To achieve the best performance from them, we might want to switch among those several convolutions.

这就是 Apache MXNet 将算子语义接口从实现接口中分离出来的原因。此分离以以下形式完成： OperatorProperty 类，它包含以下内容：−

That is the reason, Apache MXNet separate the operator semantic interface from the implementation interface. This separation is done in the form of OperatorProperty class which consists of the following−

InferShape - InferShape 接口有两个目的，如下所示：

InferShape − The InferShape interface has two purposes as given below:

First purpose is to tell the system the size of each input and output tensor so that the space can be allocated before Forward and Backward call.
Second purpose is to perform a size check to make sure that there is no error before running.

语法如下所示：−

The syntax is given below −

virtual bool InferShape(mxnet::ShapeVector *in_shape,
mxnet::ShapeVector *out_shape,
mxnet::ShapeVector *aux_shape) const = 0;

Request Resource - 如果您的系统可以管理像 cudnnConvolutionForward 这样的操作的计算工作空间会怎么样？您的系统可以执行优化，例如重用空间和更多内容。在这里，MXNet 在以下两个接口的帮助下轻松实现了这一点−

Request Resource − What if your system can manage the computation workspace for operations like cudnnConvolutionForward? Your system can perform optimizations such as reuse the space and many more. Here, MXNet easily achieve this with the help of following two interfaces−

virtual std::vector<ResourceRequest> ForwardResource(
   const mxnet::ShapeVector &in_shape) const;
virtual std::vector<ResourceRequest> BackwardResource(
   const mxnet::ShapeVector &in_shape) const;

但是，如果 ForwardResource 和 BackwardResource 返回非空数组会怎么样？在这种情况下，系统通过 Forward 和 Backward 接口的 ctx 参数提供相应的资源 Operator 。

But, what if the ForwardResource and BackwardResource return non-empty arrays? In that case, the system offers corresponding resources through ctx parameter in the Forward and Backward interface of Operator.

Backward dependency - Apache MXNet 具有以下两个不同的运算符签名来处理向后依赖：

Backward dependency − Apache MXNet has following two different operator signatures to deal with backward dependency −

void FullyConnectedForward(TBlob weight, TBlob in_data, TBlob out_data);
void FullyConnectedBackward(TBlob weight, TBlob in_data, TBlob out_grad, TBlob in_grad);
void PoolingForward(TBlob in_data, TBlob out_data);
void PoolingBackward(TBlob in_data, TBlob out_data, TBlob out_grad, TBlob in_grad);

在这里，需要注意的两个重要点：

Here, the two important points to note −

The out_data in FullyConnectedForward is not used by FullyConnectedBackward, and
PoolingBackward requires all the arguments of PoolingForward.

这就是为什么对于 FullyConnectedForward 来说，一旦消耗了 out_data 张量，就可以安全地释放它，因为后向函数不需要它。在此系统的帮助下，可以尽早收集一些张量作为垃圾。

That is why for FullyConnectedForward, the out_data tensor once consumed could be safely freed because the backward function will not need it. With the help of this system got a to collect some tensors as garbage as early as possible.

In place Option - Apache MXNet 为用户提供了另一个接口来节省内存分配的成本。此接口适用于输入和输出张量具有相同形状的逐元素运算。

In place Option − Apache MXNet provides another interface to the users to save the cost of memory allocation. The interface is appropriate for element-wise operations in which both input and output tensors have the same shape.

以下是指定就地更新的语法：

Following is the syntax for specifying the in-place update −

Example for Creating an Operator

借助 OperatorProperty，我们可以创建一个运算符。为此，请执行以下步骤：

With the help of OperatorProperty we can create an operator. To do so, follow the steps given below −

virtual std::vector<std::pair<int, void*>> ElewiseOpProperty::ForwardInplaceOption(
   const std::vector<int> &in_data,
   const std::vector<void*> &out_data)
const {
   return { {in_data[0], out_data[0]} };
}
virtual std::vector<std::pair<int, void*>> ElewiseOpProperty::BackwardInplaceOption(
   const std::vector<int> &out_grad,
   const std::vector<int> &in_data,
   const std::vector<int> &out_data,
   const std::vector<void*> &in_grad)
const {
   return { {out_grad[0], in_grad[0]} }
}

Step 1

Create Operator

首先在 OperatorProperty 中实现以下接口：

First implement the following interface in OperatorProperty:

virtual Operator* CreateOperator(Context ctx) const = 0;

示例如下：

The example is given below −

class ConvolutionOp {
   public:
      void Forward( ... ) { ... }
      void Backward( ... ) { ... }
};
class ConvolutionOpProperty : public OperatorProperty {
   public:
      Operator* CreateOperator(Context ctx) const {
         return new ConvolutionOp;
      }
};

Step 2

Parameterize Operator

如果你要实施一个卷积运算符，必须知道核大小、步幅大小、填充大小等。因为在调用任何 Forward 或 backward 接口之前，应将这些参数传递给运算符。

If you are going to implement a convolution operator, it is mandatory to know the kernel size, the stride size, padding size, and so on. Why, because these parameters should be passed to the operator before calling any Forward or backward interface.

为此，我们需要定义一个 ConvolutionParam 结构，如下所示 −

For this, we need to define a ConvolutionParam structure as below −

#include <dmlc/parameter.h>
struct ConvolutionParam : public dmlc::Parameter<ConvolutionParam> {
   mxnet::TShape kernel, stride, pad;
   uint32_t num_filter, num_group, workspace;
   bool no_bias;
};

现在，我们需要将其放入 ConvolutionOpProperty 中，并按照以下方式将其传递给运算符 −

Now, we need to put this in ConvolutionOpProperty and pass it to the operator as follows −

class ConvolutionOp {
   public:
      ConvolutionOp(ConvolutionParam p): param_(p) {}
      void Forward( ... ) { ... }
      void Backward( ... ) { ... }
   private:
      ConvolutionParam param_;
};
class ConvolutionOpProperty : public OperatorProperty {
   public:
      void Init(const vector<pair<string, string>& kwargs) {
         // initialize param_ using kwargs
      }
      Operator* CreateOperator(Context ctx) const {
         return new ConvolutionOp(param_);
      }
   private:
      ConvolutionParam param_;
};

Step 3

Register the Operator Property Class and the Parameter Class to Apache MXNet

最后，我们需要将运算符属性类和参数类注册到 MXNet。可以使用以下宏来完成此操作 −

At last, we need to register the Operator Property Class and the Parameter Class to MXNet. It can be done with the help of following macros −

DMLC_REGISTER_PARAMETER(ConvolutionParam);
MXNET_REGISTER_OP_PROPERTY(Convolution, ConvolutionOpProperty);

在上述宏中，第一个参数是名称字符串，第二个参数是属性类名称。

In the above macro, the first argument is the name string and the second is the property class name.

Apache MXNet - Unified Operator API

本章提供有关 Apache MXNet 中统一运算符应用程序编程界面 (API) 的信息。

This chapter provides information about the unified operator application programming interface (API) in Apache MXNet.

SimpleOp

SimpleOp 是一种新的统一运算符 API，可统一不同的调用进程。调用后，它返回到运算符的基本元素。统一运算符专门设计用于一元以及二元运算。这是因为大多数数学运算符都应用于一个或两个操作数，并且更多操作数使与依赖项相关的优化变得有用。

SimpleOp is a new unified operator API which unifies different invoking processes. Once invoked, it returns to the fundamental elements of operators. The unified operator is specially designed for unary as well as binary operations. It is because most of the mathematical operators attend to one or two operands and more operands make the optimization, related to dependency, useful.

我们将使用一个示例了解其 SimpleOp 统一运算符的工作原理。在此示例中，我们将创建一个充当混合损失的 l1 和 l2 损失 smooth l1 loss 的运算符。我们可以按如下所示定义和编写损失 −

We will be understanding its SimpleOp unified operator working with the help of an example. In this example, we will be creating an operator functioning as a smooth l1 loss, which is a mixture of l1 and l2 loss. We can define and write the loss as given below −

loss = outside_weight .* f(inside_weight .* (data - label))
grad = outside_weight .* inside_weight .* f'(inside_weight .* (data - label))

在此，在上述示例中，

Here, in above example,

.* stands for element-wise multiplication
f, f’ is the smooth l1 loss function which we are assuming is in mshadow.

将此特定损失实现为一元或二元运算符似乎是不可能的，但 MXNet 为其用户提供符号执行中的自动微分，这将损失直接简化为 f 和 f’。因此，我们当然可以将此特定损失实现为一元运算符。

It looks impossible to implement this particular loss as a unary or binary operator but MXNet provides its users automatic differentiation in symbolic execution which simplifies the loss to f and f’ directly. That’s why we can certainly implement this particular loss as a unary operator.

Defining Shapes

众所周知，MXNet 的 mshadow library 要求显式内存分配，因此我们需要在进行任何计算之前提供所有数据形状。在定义函数和梯度之前，我们需要提供输入形状一致性并输出形状，如下所示：

As we know MXNet’s mshadow library requires explicit memory allocation hence we need to provide all data shapes before any calculation occurs. Before defining functions and gradient, we need to provide input shape consistency and output shape as follows:

typedef mxnet::TShape (*UnaryShapeFunction)(const mxnet::TShape& src,
const EnvArguments& env);
   typedef mxnet::TShape (*BinaryShapeFunction)(const mxnet::TShape& lhs,
const mxnet::TShape& rhs,
const EnvArguments& env);

函数 mxnet::Tshape 用于检查输入数据形状并指定输出数据形状。如果没有定义此函数，则默认输出形状将与输入形状相同。例如，对于二元运算符，lhs 和 rhs 的形状默认情况下被检查为相同。

The function mxnet::Tshape is used to check input data shape and designated output data shape. In case, if you do not define this function then the default output shape would be same as input shape. For example, in case of binary operator the shape of lhs and rhs is by default checked as the same.

现在转向 smooth l1 loss example. 为此，我们需要在头实现 smooth_l1_unary-inl.h. 中为 XPU 定义一个 XPU 到 cpu 或 gpu。原因是在 smooth_l1_unary.cc 和 smooth_l1_unary.cu. 中重复使用相同的代码。

Now let’s move on to our smooth l1 loss example. For this, we need to define an XPU to cpu or gpu in the header implementation smooth_l1_unary-inl.h. The reason is to reuse the same code in smooth_l1_unary.cc and smooth_l1_unary.cu.

#include <mxnet/operator_util.h>
   #if defined(__CUDACC__)
      #define XPU gpu
   #else
      #define XPU cpu
#endif

由于在我们的 smooth l1 loss example, 中输出与源形状相同，因此我们可以使用默认行为。可以写成如下形式 −

As in our smooth l1 loss example, the output has the same shape as the source, we can use the default behavior. It can be written as follows −

inline mxnet::TShape SmoothL1Shape_(const mxnet::TShape& src,const EnvArguments& env) {
   return mxnet::TShape(src);
}

Defining Functions

我们可以使用一个输入创建一元或二元函数，如下所示 −

We can create a unary or binary function with one input as follows −

typedef void (*UnaryFunction)(const TBlob& src,
   const EnvArguments& env,
   TBlob* ret,
   OpReqType req,
   RunContext ctx);
typedef void (*BinaryFunction)(const TBlob& lhs,
   const TBlob& rhs,
   const EnvArguments& env,
   TBlob* ret,
   OpReqType req,
   RunContext ctx);

以下是包含运行时执行所需信息的 RunContext ctx struct −

Following is the RunContext ctx struct which contains the information needed during runtime for execution −

struct RunContext {
   void *stream; // the stream of the device, can be NULL or Stream<gpu>* in GPU mode
   template<typename xpu> inline mshadow::Stream<xpu>* get_stream() // get mshadow stream from Context
} // namespace mxnet

现在，让我们看看如何在 ret 中编写计算结果。

Now, let’s see how we can write the computation results in ret.

enum OpReqType {
   kNullOp, // no operation, do not write anything
   kWriteTo, // write gradient to provided space
   kWriteInplace, // perform an in-place write
   kAddTo // add to the provided space
};

现在，让我们继续我们的 smooth l1 loss example 。为此，我们将使用 UnaryFunction 来定义此运算符的功能，如下所示：

Now, let’s move on to our smooth l1 loss example. For this, we will use UnaryFunction to define the function of this operator as follows:

template<typename xpu>
void SmoothL1Forward_(const TBlob& src,
   const EnvArguments& env,
   TBlob *ret,
   OpReqType req,
RunContext ctx) {
   using namespace mshadow;
   using namespace mshadow::expr;
   mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
   real_t sigma2 = env.scalar * env.scalar;
   MSHADOW_TYPE_SWITCH(ret->type_flag_, DType, {
      mshadow::Tensor<xpu, 2, DType> out = ret->get<xpu, 2, DType>(s);
      mshadow::Tensor<xpu, 2, DType> in = src.get<xpu, 2, DType>(s);
      ASSIGN_DISPATCH(out, req,
      F<mshadow_op::smooth_l1_loss>(in, ScalarExp<DType>(sigma2)));
   });
}

Defining Gradients

除了 Input, TBlob, 和 OpReqType 加倍，二元运算符的梯度函数具有相似的结构。让我们在下面查看，我们在其中创建了具有各种类型输入的梯度函数：

Except Input, TBlob, and OpReqType are doubled, Gradients functions of binary operators have similar structure. Let’s check out below, where we created a gradient function with various types of input:

// depending only on out_grad
typedef void (*UnaryGradFunctionT0)(const OutputGrad& out_grad,
   const EnvArguments& env,
   TBlob* in_grad,
   OpReqType req,
   RunContext ctx);
// depending only on out_value
typedef void (*UnaryGradFunctionT1)(const OutputGrad& out_grad,
   const OutputValue& out_value,
   const EnvArguments& env,
   TBlob* in_grad,
   OpReqType req,
   RunContext ctx);
// depending only on in_data
typedef void (*UnaryGradFunctionT2)(const OutputGrad& out_grad,
   const Input0& in_data0,
   const EnvArguments& env,
   TBlob* in_grad,
   OpReqType req,
   RunContext ctx);

如上所述 Input0, Input, OutputValue, 和 OutputGrad *all share the structure of *GradientFunctionArgument. 定义如下 −

As defined above Input0, Input, OutputValue, and OutputGrad *all share the structure of *GradientFunctionArgument. It is defined as follows −

struct GradFunctionArgument {
   TBlob data;
}

现在让我们继续我们的 smooth l1 loss example 。为了启用梯度的链式法则，我们需要将 out_grad 从上乘到 in_grad 的结果。

Now let’s move on to our smooth l1 loss example. For this to enable the chain rule of gradient we need to multiply out_grad from the top to the result of in_grad.

template<typename xpu>
void SmoothL1BackwardUseIn_(const OutputGrad& out_grad, const Input0& in_data0,
   const EnvArguments& env,
   TBlob *in_grad,
   OpReqType req,
   RunContext ctx) {
   using namespace mshadow;
   using namespace mshadow::expr;
   mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
   real_t sigma2 = env.scalar * env.scalar;
      MSHADOW_TYPE_SWITCH(in_grad->type_flag_, DType, {
      mshadow::Tensor<xpu, 2, DType> src = in_data0.data.get<xpu, 2, DType>(s);
      mshadow::Tensor<xpu, 2, DType> ograd = out_grad.data.get<xpu, 2, DType>(s);
      mshadow::Tensor<xpu, 2, DType> igrad = in_grad->get<xpu, 2, DType>(s);
      ASSIGN_DISPATCH(igrad, req,
      ograd * F<mshadow_op::smooth_l1_gradient>(src, ScalarExp<DType>(sigma2)));
   });
}

Register SimpleOp to MXNet

一旦我们创建了形状、函数和梯度，我们就需要将它们还原为 NDArray 运算符和符号运算符。为此，我们可以使用如下所示的注册宏：

Once we created the shape, function, and gradient, we need to restore them into both an NDArray operator as well as into a symbolic operator. For this, we can use the registration macro as follows −

MXNET_REGISTER_SIMPLE_OP(Name, DEV)
   .set_shape_function(Shape)
   .set_function(DEV::kDevMask, Function<XPU>, SimpleOpInplaceOption)
   .set_gradient(DEV::kDevMask, Gradient<XPU>, SimpleOpInplaceOption)
   .describe("description");

SimpleOpInplaceOption 可以定义如下 −

The SimpleOpInplaceOption can be defined as follows −

enum SimpleOpInplaceOption {
   kNoInplace, // do not allow inplace in arguments
   kInplaceInOut, // allow inplace in with out (unary)
   kInplaceOutIn, // allow inplace out_grad with in_grad (unary)
   kInplaceLhsOut, // allow inplace left operand with out (binary)

   kInplaceOutLhs // allow inplace out_grad with lhs_grad (binary)
};

现在让我们继续我们的 smooth l1 loss example 。为此，我们有一个依赖于输入数据的梯度函数，因此无法在原地编写该函数。

Now let’s move on to our smooth l1 loss example. For this, we have a gradient function that relies on input data so that the function cannot be written in place.

MXNET_REGISTER_SIMPLE_OP(smooth_l1, XPU)
.set_function(XPU::kDevMask, SmoothL1Forward_<XPU>, kNoInplace)
.set_gradient(XPU::kDevMask, SmoothL1BackwardUseIn_<XPU>, kInplaceOutIn)
.set_enable_scalar(true)
.describe("Calculate Smooth L1 Loss(lhs, scalar)");

SimpleOp on EnvArguments

正如我们所知，某些操作可能需要以下功能 −

As we know some operations might need the following −

A scalar as input such as a gradient scale
A set of keyword arguments controlling behavior
A temporary space to speed up calculations.

使用 EnvArguments 的好处是它提供了额外的参数和资源，使计算更具可扩展性和效率。

The benefit of using EnvArguments is that it provides additional arguments and resources to make calculations more scalable and efficient.

Example

首先让我们定义以下结构：

First let’s define the struct as below −

struct EnvArguments {
   real_t scalar; // scalar argument, if enabled
   std::vector<std::pair<std::string, std::string> > kwargs; // keyword arguments
   std::vector<Resource> resource; // pointer to the resources requested
};

接下来，我们需要从 EnvArguments.resource. 中请求额外的资源（如 mshadow::Random<xpu> ）和临时内存空间。可以如下所示完成：

Next, we need to request additional resources like mshadow::Random<xpu> and temporary memory space from EnvArguments.resource. It can be done as follows −

struct ResourceRequest {
   enum Type { // Resource type, indicating what the pointer type is
      kRandom, // mshadow::Random<xpu> object
      kTempSpace // A dynamic temp space that can be arbitrary size
   };
   Type type; // type of resources
};

现在，注册将从 mxnet::ResourceManager. 请求已声明的资源请求。之后，它将把资源放在 std::vector<Resource> resource in EnvAgruments. 中

Now, the registration will request the declared resource request from mxnet::ResourceManager. After that, it will place the resources in std::vector<Resource> resource in EnvAgruments.

我们可以借助以下代码访问资源 −

We can access the resources with the help of following code −

auto tmp_space_res = env.resources[0].get_space(some_shape, some_stream);
auto rand_res = env.resources[0].get_random(some_stream);

如果您在我们的平滑 l1 损失示例中看到，需要一个标量输入来标记损失函数的转折点。这就是为什么在注册过程中，我们在函数和梯度声明中使用 set_enable_scalar(true) 和 env.scalar 。

If you see in our smooth l1 loss example, a scalar input is needed to mark the turning point of a loss function. That’s why in the registration process, we use set_enable_scalar(true), and env.scalar in function and gradient declarations.

Building Tensor Operation

在这里提出了一个问题，为什么我们需要构建张量运算？原因如下−

Here the question arises that why we need to craft tensor operations? The reasons are as follows −

Computation utilizes the mshadow library and we sometimes do not have functions readily available.
If an operation is not done in an element-wise way such as softmax loss and gradient.

Example

在这里，我们使用上述的平滑 l1 损失示例。我们将创建两个映射器，即平滑 l1 损失和梯度的标量案例：

Here, we are using the above smooth l1 loss example. We will be creating two mappers namely the scalar cases of smooth l1 loss and gradient:

namespace mshadow_op {
   struct smooth_l1_loss {
      // a is x, b is sigma2
      MSHADOW_XINLINE static real_t Map(real_t a, real_t b) {
         if (a > 1.0f / b) {
            return a - 0.5f / b;
         } else if (a < -1.0f / b) {
            return -a - 0.5f / b;
         } else {
            return 0.5f * a * a * b;
         }
      }
   };
}

Apache MXNet - Distributed Training

本章是关于 Apache MXNet 中的分布式训练。让我们首先了解 MXNet 中的计算模式。

This chapter is about the distributed training in Apache MXNet. Let us start by understanding what are the modes of computation in MXNet.

Modes of Computation

MXNet，一个多语言 ML 库，为其用户提供了以下两种计算模式−

MXNet, a multi-language ML library, offers its users the following two modes of computation −

Imperative mode

这种计算模式公开了像 NumPy API 这样的接口。例如，在 MXNet 中，使用以下命令式代码在 CPU 和 GPU 上构建一个零张量 −

This mode of computation exposes an interface like NumPy API. For example, in MXNet, use the following imperative code to construct a tensor of zeros on both CPU as well as GPU −

import mxnet as mx
tensor_cpu = mx.nd.zeros((100,), ctx=mx.cpu())
tensor_gpu= mx.nd.zeros((100,), ctx=mx.gpu(0))

正如我们在上面的代码中看到的，MXNets 指定了保存张量的位置，可以在 CPU 或 GPU 设备中。在上面的示例中，它位于位置 0。MXNet 可以令人难以置信地利用设备，因为所有计算都是延迟发生的，而不是瞬间发生的。

As we see in the above code, MXNets specifies the location where to hold the tensor, either in CPU or GPU device. In above example, it is at location 0. MXNet achieve incredible utilisation of the device, because all the computations happen lazily instead of instantaneously.

Symbolic mode

虽然命令式模式非常有用，但这种模式的一个缺点是它的刚性，即所有计算都需要提前知道，以及预定义的数据结构。

Although the imperative mode is quite useful, but one of the drawbacks of this mode is its rigidity, i.e. all the computations need to be known beforehand along with pre-defined data structures.

另一方面，符号模式公开了像 TensorFlow 这样的计算图。它通过允许 MXNet 使用符号或变量来代替固定/预定义的数据结构，从而消除了命令式 API 的缺点。然后，可以将符号解释为一组操作，如下所示−

On the other hand, Symbolic mode exposes a computation graph like TensorFlow. It removes the drawback of imperative API by allowing MXNet to work with symbols or variables instead of fixed/pre-defined data structures. Afterwards, the symbols can be interpreted as a set of operations as follows −

import mxnet as mx
x = mx.sym.Variable(“X”)
y = mx.sym.Variable(“Y”)
z = (x+y)
m = z/100

Kinds of Parallelism

Apache MXNet 支持分布式训练。它使我们能够利用多台机器进行更快速、更有效率的训练。

Apache MXNet supports distributed training. It enables us to leverage multiple machines for faster as well as effective training.

以下是我们可以在多个设备（CPU 或 GPU 设备）上分配神经网络训练工作负载的两种方式−

Following are the two ways in which, we can distribute the workload of training a NN across multiple devices, CPU or GPU device −

Data Parallelism

在这种并行中，每个设备都存储模型的完整副本，并使用数据集的不同部分。设备还会共同更新一个共享模型。我们可以将所有设备放在一台机器上，也可以放在多台机器上。

In this kind of parallelism, each device stores a complete copy of the model and works with a different part of the dataset. Devices also update a shared model collectively. We can locate all the devices on a single machine or across multiple machines.

Model Parallelism

这是另一种并行，它在模型非常大以致无法放入设备内存时派上用场。在模型并行中，不同的设备被分配了学习模型不同部分的任务。这里要说明的重要一点是，目前 Apache MXNet 仅支持单台机器上的模型并行。

It is another kind of parallelism, which comes handy when models are so large that they do not fit into device memory. In model parallelism, different devices are assigned the task of learning different parts of the model. The important point here to note is that currently Apache MXNet supports model parallelism in a single machine only.

Working of distributed training

以下给出的概念是理解 Apache MXNet 中分布式训练工作原理的关键−

The concepts given below are the key to understand the working of distributed training in Apache MXNet −

Types of processes

进程相互通信以完成模型的训练。Apache MXNet 有以下三个进程−

Processes communicates with each other to accomplish the training of a model. Apache MXNet has the following three processes −

Worker

工作节点的任务是对一批训练样本进行训练。工作器节点将在处理每一批之前从服务器获取权重。在处理完该批后，工作器节点将梯度发送到服务器。

The job of worker node is to perform training on a batch of training samples. The Worker nodes will pull weights from the server before processing every batch. The Worker nodes will send gradients to the server, once the batch is processed.

Server

MXNet 可以有多个服务器来存储模型的参数并与工作器节点通信。

MXNet can have multiple servers for storing the model’s parameters and to communicate with the worker nodes.

Scheduler

调度程序的作用是设置集群，其中包括等待每台节点启动的消息以及该节点正在监听的端口。在设置集群后，调度程序让所有进程了解集群中的其他所有节点。这是因为进程可以相互通信。只有一个调度程序。

The role of the scheduler is to set up the cluster, which includes waiting for messages that each node has come up and which port the node is listening to. After setting up the cluster, the scheduler lets all the processes know about every other node in the cluster. It is because the processes can communicate with each other. There is only one scheduler.

KV Store

KV 存储代表 Key-Value 存储。它是用于多设备训练的关键组件。它很重要，因为单一计算机和多台计算机跨设备的参数通信可以通过具有 KVStore 的一个或多个服务器传输参数。让我们借助以下要点了解 KVStore 的工作原理 −

KV stores stands for Key-Value store. It is critical component used for multi-device training. It is important because, the communication of parameters across devices on single as well as across multiple machines is transmitted through one or more servers with a KVStore for the parameters. Let’s understand the working of KVStore with the help of following points −

Each value in KVStore is represented by a key and a value.
Each parameter array in the network is assigned a key and the weights of that parameter array is referred by value.
After that, the worker nodes push gradients after processing a batch. They also pull updated weights before processing a new batch.

KVStore 服务器的概念仅在分布式训练期间存在，其分布式模式可以通过使用包含字符串 dist 的字符串参数 mxnet.kvstore.create 函数来启用

The notion of KVStore server exists only during distributed training and the distributed mode of it is enabled by calling mxnet.kvstore.create function with a string argument containing the word dist −

kv = mxnet.kvstore.create(‘dist_sync’)

Distribution of Keys

并非所有服务器都存储所有参数数组或键是必要的，但它们分布在不同的服务器上。键在不同服务器上的这种分布由 KVStore 透明处理，并且哪个服务器存储特定键的决定是随机做出的。

It is not necessary that, all the servers store all the parameters array or keys, but they are distributed across different servers. Such distribution of keys across different servers is handled transparently by the KVStore and the decision of which server stores a specific key is made at random.

如上所述，KVStore 确保每当拉取密钥时，都会将请求发送到具有相应值的服务器。如果某个键的值很大怎么办？在这种情况下，它可以在不同的服务器之间共享。

KVStore, as discussed above, ensures that whenever the key is pulled, its request is sent to that server, which has the corresponding value. What if the value of some key is large? In that case, it may be shared across different servers.

Split training data

作为用户，我们希望每台机器都能处理数据集的不同部分，特别是当在数据并行模式下运行分布式训练时。我们知道，为了将数据迭代器提供的样本块拆分为用于在单个工作器上进行数据并行训练，我们可以使用 mxnet.gluon.utils.split_and_load ，然后将块的各个部分加载到将进一步处理它的设备上。

As being the users, we want each machine to be working on different parts of the dataset, especially, when running distributed training in data parallel mode. We know that, to split a batch of samples provided by the data iterator for data parallel training on a single worker we can use mxnet.gluon.utils.split_and_load and then, load each part of the batch on the device which will process it further.

另一方面，在分布式训练的情况下，在开始时我们需要将数据集分为 n 不同的部分，以便每个工作器得到一个不同的部分。一旦得到，每个工作器都可以使用 split_and_load 再次将数据集的部分在单个计算机上的不同设备上进行划分。所有这些都通过数据迭代器发生。 mxnet.io.MNISTIterator 和 mxnet.io.ImageRecordIter 是 MXNet 中支持此功能的两个此类迭代器。

On the other hand, in case of distributed training, at beginning we need to divide the dataset into n different parts so that every worker gets a different part. Once got, each worker can then use split_and_load to again divide that part of the dataset across different devices on a single machine. All this happen through data iterator. mxnet.io.MNISTIterator and mxnet.io.ImageRecordIter are two such iterators in MXNet that support this feature.

Weights updating

对于更新权重，KVStore 支持以下两种模式 −

For updating the weights, KVStore supports following two modes −

First method aggregates the gradients and updates the weights by using those gradients.
In the second method the server only aggregates gradients.

如果你正在使用 Gluon，可以通过传递 update_on_kvstore 变量在上述方法之间进行选择。我们通过如下创建 trainer 对象来理解它 −

If you are using Gluon, there is an option to choose between above stated methods by passing update_on_kvstore variable. Let’s understand it by creating the trainer object as follows −

trainer = gluon.Trainer(net.collect_params(), optimizer='sgd',
   optimizer_params={'learning_rate': opt.lr,
      'wd': opt.wd,
      'momentum': opt.momentum,
      'multi_precision': True},
      kvstore=kv,
   update_on_kvstore=True)

Modes of Distributed Training

如果 KVStore 创建字符串包含单词 dist，则表示已启用分布式训练。以下是可以通过使用不同类型的 KVStore 启用的不同分布式训练模式 −

If the KVStore creation string contains the word dist, it means the distributed training is enabled. Following are different modes of distributed training that can be enabled by using different types of KVStore −

dist_sync

顾名思义，它表示同步分布式训练。在此当中，所有工作器在每一批次的开始时使用相同的同步模型参数集。

As name implies, it denotes synchronous distributed training. In this, all the workers use the same synchronized set of model parameters at the start of every batch.

此模式的缺点是，在每个批次之后，服务器必须等待接收来自每个工作器的梯度，然后再更新模型参数。这意味着如果一个工作器崩溃，它将停止所有工作器的进度。

The drawback of this mode is that, after each batch the server should have to wait to receive gradients from each worker before it updates the model parameters. This means that if a worker crashes, it would halt the progress of all workers.

dist_async

顾名思义，它表示同步分布式训练。在此当中，服务器接收来自一个工作器的梯度并立即更新其存储。服务器使用更新的存储来响应任何进一步的拉取。

As name implies, it denotes synchronous distributed training. In this, the server receives gradients from one worker and immediately updates its store. Server uses the updated store to respond to any further pulls.

与 dist_sync mode 相比，它的优点在于完成一批次处理的工作器可以从服务器拉取当前参数并开始下一批次。即使其他工作器尚未完成对前一批次的处理，工作器也可以这样做。它也比 dist_sync 模式快，因为它可以花费更多的时间进行收敛而无需任何同步成本。

The advantage, in comparison of dist_sync mode, is that a worker who finishes processing a batch can pull the current parameters from server and start the next batch. The worker can do so, even if the other worker has not yet finished processing the earlier batch. It is also faster than dist_sync mode because, it can take more epochs to converge without any cost of synchronization.

dist_sync_device

此模式与 dist_sync 模式相同。唯一的区别是，当每个节点上使用多个 GPU 时， dist_sync_device 在 GPU 上聚合梯度和更新权重，而 dist_sync 在 CPU 内存上聚合梯度和更新权重。

This mode is same as dist_sync mode. The only difference is that, when there are multiple GPUs being used on every node dist_sync_device aggregates gradients and updates weights on GPU whereas, dist_sync aggregates gradients and updates weights on CPU memory.

它减少了 GPU 与 CPU 之间昂贵的通信。这就是为什么它比 dist_sync 快的原因。缺点是它增加了 GPU 上的内存使用量。

It reduces expensive communication between GPU and CPU. That is why, it is faster than dist_sync. The drawback is that it increases the memory usage on GPU.

dist_async_device

此模式的工作方式与 dist_sync_device 模式相同，但处于异步模式。

This mode works same as dist_sync_device mode, but in asynchronous mode.

Apache MXNet - Python Packages

在本章中，我们将了解 Apache MXNet 中可用的 Python 包。

In this chapter we will learn about the Python Packages available in Apache MXNet.

Important MXNet Python packages

MXNet 具有以下重要的 Python 包，我们将会逐个讨论 −

MXNet has the following important Python packages which we will be discussing one by one −

Autograd (Automatic Differentiation)
NDArray
KVStore
Gluon
Visualization

首先我们从 Apache MXNet 的 Autograd Python 包开始。

First let us start with Autograd Python package for Apache MXNet.

Autograd

Autograd 表示 automatic differentiation 用于将梯度从损失指标反向传播到每个参数。它在反向传播时使用动态规划方法来有效计算梯度。它也被称为反向模式自动微分化。这种技术在很多参数影响单个损失指标的「扇入」情况下非常有效。

Autograd stands for automatic differentiation used to backpropagate the gradients from the loss metric back to each of the parameters. Along with backpropagation it uses a dynamic programming approach to efficiently calculate the gradients. It is also called reverse mode automatic differentiation. This technique is very efficient in ‘fan-in’ situations where, many parameters effect a single loss metric.

What are gradients?

梯度是神经网络训练过程中的基础。它们基本上告诉我们如何改变网络参数以提高其性能。

Gradients are the fundamentals to the process of neural network training. They basically tell us how to change the parameters of the network to improve its performance.

众所周知，神经网络 (NN) 由加法、乘积、卷积等运算符组成。这些运算符在计算中会使用参数，例如卷积核中的权重。我们必须为这些参数找到最佳值，而梯度则向我们展示了方法并引导我们找到解决方案。

As we know that, neural networks (NN) are composed of operators such as sums, product, convolutions, etc. These operators, for their computations, use parameters such as the weights in convolution kernels. We should have to find the optimal values for these parameters and gradients shows us the way and lead us to the solution as well.

我们关注的是改变参数对网络性能的影响，梯度会告诉我们，在某个变量依赖于某个变量时，当我们改变该变量时，该变量会增加或减少多少。性能通常使用我们尝试最小化的损失指标来定义。例如，对于回归，我们可能尝试最小化我们的预测与精确值之间的 L2 损失，而对于分类，我们可能最小化 cross-entropy loss 。

We are interested in the effect of changing a parameter on performance of the network and gradients tell us, how much a given variable increases or decreases when we change a variable it depends on. The performance is usually defined by using a loss metric that we try to minimise. For example, for regression we might try to minimise L2 loss between our predictions and exact value, whereas for classification we might minimise the cross-entropy loss.

一旦我们按照损失计算了每个参数的梯度，就可以使用优化器，例如随机梯度下降。

Once we calculate the gradient of each parameter with reference to the loss, we can then use an optimiser, such as stochastic gradient descent.

How to calculate gradients?

我们有以下选项来计算梯度 −

We have the following options to calculate gradients −

Symbolic Differentiation − The very first option is Symbolic Differentiation, which calculates the formulas for each gradient. The drawback of this method is that, it will quickly lead to incredibly long formulas as the network get deeper and operators get more complex.
Finite Differencing − Another option is, to use finite differencing which try slight differences on each parameter and see how the loss metric responds. The drawback of this method is that, it would be computationally expensive and may have poor numerical precision.
Automatic differentiation − The solution to the drawbacks of the above methods is, to use automatic differentiation to backpropagate the gradients from the loss metric back to each of the parameters. Propagation allows us a dynamic programming approach to efficiently calculate the gradients. This method is also called reverse mode automatic differentiation.

Automatic Differentiation (autograd)

在这里，我们将详细了解 autograd 的工作原理。它基本上在以下两个阶段进行工作 −

Here, we will understand in detail the working of autograd. It basically works in following two stages −

Stage 1 − 该阶段称为训练的 ‘Forward Pass’ 。顾名思义，在此阶段它创建了网络用于预测和计算损失指标所使用运算符的记录。

Stage 1 − This stage is called ‘Forward Pass’ of training. As name implies, in this stage it creates the record of the operator used by the network to make predictions and calculate the loss metric.

Stage 2 − 该阶段称为训练的 ‘Backward Pass’ 。顾名思义，在此阶段它通过该记录向后进行工作。向后进行时，它会评估每个运算符的偏导数，一直到网络参数。

Stage 2 − This stage is called ‘Backward Pass’ of training. As name implies, in this stage it works backwards through this record. Going backwards, it evaluates the partial derivatives of each operator, all the way back to the network parameter.

Advantages of autograd

以下是使用自动微分化 (autograd) 的优势 −

Following are the advantages of using Automatic Differentiation (autograd) −

Flexible − Flexibility, that it gives us when defining our network, is one of the huge benefits of using autograd. We can change the operations on every iteration. These are called the dynamic graphs, which are much more complex to implement in frameworks requiring static graph. Autograd, even in such cases, will still be able to backpropagate the gradients correctly.
Automatic − Autograd is automatic, i.e. the complexities of the backpropagation procedure are taken care of by it for you. We just need to specify what gradients we are interested in calculating.
Efficient − Autogard calculates the gradients very efficiently.
Can use native Python control flow operators − We can use the native Python control flow operators such as if condition and while loop. The autograd will still be able to backpropagate the gradients efficiently and correctly.

Using autograd in MXNet Gluon

这里，借助示例，我们将了解如何在 MXNet Gluon 中使用 autograd 。

Here, with the help of an example, we will see how we can use autograd in MXNet Gluon.

Implementation Example

在以下示例中，我们将实现一个有两个层回归模型。实现后，我们将使用自动梯度计算损失函数关于各个权重参数的梯度 −

In the following example, we will implement the regression model having two layers. After implementing, we will use autograd to automatically calculate the gradient of the loss with reference to each of the weight parameters −

首先，如下导入自动梯度和其他必需的包 −

First import the autogrard and other required packages as follows −

from mxnet import autograd
import mxnet as mx
from mxnet.gluon.nn import HybridSequential, Dense
from mxnet.gluon.loss import L2Loss

现在，我们需要如下定义网络 −

Now, we need to define the network as follows −

N_net = HybridSequential()
N_net.add(Dense(units=3))
N_net.add(Dense(units=1))
N_net.initialize()

现在我们需要如下定义损失 −

Now we need to define the loss as follows −

loss_function = L2Loss()

接下来，我们需要如下创建虚拟数据 −

Next, we need to create the dummy data as follows −

x = mx.nd.array([[0.5, 0.9]])
y = mx.nd.array([[1.5]])

现在，我们已准备好进行首次前向网络遍历。我们希望自动梯度记录计算图，以便我们可以计算梯度。为此，我们需要在 autograd.record 上下文的范围内运行网络代码，如下所示 −

Now, we are ready for our first forward pass through the network. We want autograd to record the computational graph so that we can calculate the gradients. For this, we need to run the network code in the scope of autograd.record context as follows −

with autograd.record():
   y_hat = N_net(x)
   loss = loss_function(y_hat, y)

现在，我们准备进行反向遍历，首先对目标量调用反向方法。在我们示例中，目标量是损失，因为我们正尝试计算损失关于参数的梯度 −

Now, we are ready for the backward pass, which we start by calling the backward method on the quantity of interest. The quatity of interest in our example is loss because we are trying to calculate the gradient of loss with reference to the parameters −

loss.backward()

现在，对于网络的每个参数，我们都有梯度，优化器将使用这些梯度来更新参数值，以提升性能。让我们检查第一层的梯度，如下所示 −

Now, we have gradients for each parameter of the network, which will be used by the optimiser to update the parameter value for improved performance. Let’s check out the gradients of the 1st layer as follows −

N_net[0].weight.grad()

Output

输出如下 −

The output is as follows−

[[-0.00470527 -0.00846948]
[-0.03640365 -0.06552657]
[ 0.00800354 0.01440637]]
<NDArray 3x2 @cpu(0)>

Complete implementation example

完整实现示例如下：

Given below is the complete implementation example.

from mxnet import autograd
import mxnet as mx
from mxnet.gluon.nn import HybridSequential, Dense
from mxnet.gluon.loss import L2Loss
N_net = HybridSequential()
N_net.add(Dense(units=3))
N_net.add(Dense(units=1))
N_net.initialize()
loss_function = L2Loss()
x = mx.nd.array([[0.5, 0.9]])
y = mx.nd.array([[1.5]])
with autograd.record():
y_hat = N_net(x)
loss = loss_function(y_hat, y)
loss.backward()
N_net[0].weight.grad()

Apache MXNet - NDArray

在本章中，我们将讨论 MXNet 名为 ndarray 的多维数组格式。

In this chapter, we will be discussing about MXNet’s multi-dimensional array format called ndarray.

Handling data with NDArray

首先，我们将了解如何使用 NDArray 处理数据。以下是其先决条件 −

First, we are going see how we can handle data with NDArray. Following are the prerequisites for the same −

Prerequisites

为了了解如何使用此多维数组格式处理数据，我们需要满足以下先决条件：

To understand how we can handle data with this multi-dimensional array format, we need to fulfil the following prerequisites:

MXNet installed in a Python environment
Python 2.7.x or Python 3.x

Implementation Example

我们借助以下示例了解基本功能 −

Let us understand the basic functionality with the help of an example given below −

首先需要如下从 MXNet 导入 MXNet 和 ndrray −

First, we need to import MXNet and ndarray from MXNet as follows −

import mxnet as mx
from mxnet import nd

导入必要的库后，我们将继续以下基本功能：

Once we import the necessary libraries, we will go with the following basic functionalities:

A simple 1-D array with a python list

Example

x = nd.array([1,2,3,4,5,6,7,8,9,10])
print(x)

Output

输出如下所述 −

The output is as mentioned below −

[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
<NDArray 10 @cpu(0)>

A 2-D array with a python list

Example

y = nd.array([[1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10]])
print(y)

Output

输出如下所示 −

The output is as stated below −

[[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]]
<NDArray 3x10 @cpu(0)>

Creating an NDArray without any initialisation

这里，我们将使用 .empty 函数创建包含 3 行和 4 列的矩阵。我们还将使用 .full 函数，它会将其他运算符用作想要填充到数组中的值。

Here, we will create a matrix with 3 rows and 4 columns by using .empty function. We will also use .full function, which will take an additional operator for what value you want to fill in the array.

Example

x = nd.empty((3, 4))
print(x)
x = nd.full((3,4), 8)
print(x)

Output

输出如下 −

The output is given below −

[[0.000e+00 0.000e+00 0.000e+00 0.000e+00]
 [0.000e+00 0.000e+00 2.887e-42 0.000e+00]
 [0.000e+00 0.000e+00 0.000e+00 0.000e+00]]
<NDArray 3x4 @cpu(0)>

[[8. 8. 8. 8.]
 [8. 8. 8. 8.]
 [8. 8. 8. 8.]]
<NDArray 3x4 @cpu(0)>

Matrix of all zeros with the .zeros function

Example

x = nd.zeros((3, 8))
print(x)

Output

输出如下 −

The output is as follows −

[[0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]]
<NDArray 3x8 @cpu(0)>

Matrix of all ones with the .ones function

Example

x = nd.ones((3, 8))
print(x)

Output

输出如下：

The output is mentioned below −

[[1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]]
<NDArray 3x8 @cpu(0)>

Creating array whose values are sampled randomly

Example

y = nd.random_normal(0, 1, shape=(3, 4))
print(y)

Output

输出如下 −

The output is given below −

[[ 1.2673576 -2.0345826 -0.32537818 -1.4583491 ]
 [-0.11176403 1.3606371 -0.7889914 -0.17639421]
 [-0.2532185 -0.42614475 -0.12548696 1.4022992 ]]
<NDArray 3x4 @cpu(0)>

Finding dimension of each NDArray

Example

y.shape

Output

输出如下 −

The output is as follows −

(3, 4)

Finding the size of each NDArray

Example

y.size

Output

Finding the datatype of each NDArray

Example

y.dtype

Output

numpy.float32

NDArray Operations

在此部分中，我们将向你介绍 MXNet 的数组操作。NDArray 支持大量标准数学运算和就地运算。

In this section, we will introduce you to MXNet’s array operations. NDArray support large number of standard mathematical as well as In-place operations.

Standard Mathematical Operations

以下是 NDArray 支持的标准数学运算 −

Following are standard mathematical operations supported by NDArray −

Element-wise addition

首先需要如下从 MXNet 导入 MXNet 和 ndrray：

First, we need to import MXNet and ndarray from MXNet as follows:

import mxnet as mx
from mxnet import nd
x = nd.ones((3, 5))
y = nd.random_normal(0, 1, shape=(3, 5))
print('x=', x)
print('y=', y)
x = x + y
print('x = x + y, x=', x)

Output

输出与此一同给出 −

The output is given herewith −

x=
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
<NDArray 3x5 @cpu(0)>
y=
[[-1.0554522 -1.3118273 -0.14674698 0.641493 -0.73820823]
[ 2.031364 0.5932667 0.10228804 1.179526 -0.5444829 ]
[-0.34249446 1.1086396 1.2756858 -1.8332436 -0.5289873 ]]
<NDArray 3x5 @cpu(0)>
x = x + y, x=
[[-0.05545223 -0.3118273 0.853253 1.6414931 0.26179177]
[ 3.031364 1.5932667 1.102288 2.1795259 0.4555171 ]
[ 0.6575055 2.1086397 2.2756858 -0.8332436 0.4710127 ]]
<NDArray 3x5 @cpu(0)>

Element-wise multiplication

Example

x = nd.array([1, 2, 3, 4])
y = nd.array([2, 2, 2, 1])
x * y

Output

您将看到以下输出−

You will see the following output−

[2. 4. 6. 4.]
<NDArray 4 @cpu(0)>

Exponentiation

Example

nd.exp(x)

Output

当运行代码时，您将看到以下输出：

When you run the code, you will see the following output:

[ 2.7182817 7.389056 20.085537 54.59815 ]
<NDArray 4 @cpu(0)>

Matrix transpose to compute matrix-matrix product

Example

nd.dot(x, y.T)

Output

以下是代码的输出 −

Given below is the output of the code −

[16.]
<NDArray 1 @cpu(0)>

In-place Operations

在上述示例中，每次我们运行一项操作时，我们都会分配一个新内存来承载其结果。

Every time, in the above example, we ran an operation, we allocated a new memory to host its result.

例如，如果我们编写 A = A+B，我们将解除引用 A 用于指向的矩阵，而将其指向新分配的内存。让我们借助 Python 的 id() 函数使用下面给出的示例来理解它−

For example, if we write A = A+B, we will dereference the matrix that A used to point to and instead point it at the newly allocated memory. Let us understand it with the example given below, using Python’s id() function −

print('y=', y)
print('id(y):', id(y))
y = y + x
print('after y=y+x, y=', y)
print('id(y):', id(y))

Output

执行后，您将收到以下输出：

Upon execution, you will receive the following output −

y=
[2. 2. 2. 1.]
<NDArray 4 @cpu(0)>
id(y): 2438905634376
after y=y+x, y=
[3. 4. 5. 5.]
<NDArray 4 @cpu(0)>
id(y): 2438905685664

事实上，我们还可以将结果分配给以前分配的数组，如下所示−

In fact, we can also assign the result to a previously allocated array as follows −

print('x=', x)
z = nd.zeros_like(x)
print('z is zeros_like x, z=', z)
print('id(z):', id(z))
print('y=', y)
z[:] = x + y
print('z[:] = x + y, z=', z)
print('id(z) is the same as before:', id(z))

Output

输出如下所示−

The output is shown below −

x=
[1. 2. 3. 4.]
<NDArray 4 @cpu(0)>
z is zeros_like x, z=
[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>
id(z): 2438905790760
y=
[3. 4. 5. 5.]
<NDArray 4 @cpu(0)>
z[:] = x + y, z=
[4. 6. 8. 9.]
<NDArray 4 @cpu(0)>
id(z) is the same as before: 2438905790760

从上述输出中，我们可以看到 x+y 仍会分配一个临时缓冲区来存储结果，然后再将其复制到 z。所以现在，我们可以就地执行操作，从而更好地利用内存并避免临时缓冲区。为此，我们将按如下方式指定所有运算符支持的 out 关键字参数−

From the above output, we can see that x+y will still allocate a temporary buffer to store the result before copying it to z. So now, we can perform operations in-place to make better use of memory and to avoid temporary buffer. To do this, we will specify the out keyword argument every operator support as follows −

print('x=', x, 'is in id(x):', id(x))
print('y=', y, 'is in id(y):', id(y))
print('z=', z, 'is in id(z):', id(z))
nd.elemwise_add(x, y, out=z)
print('after nd.elemwise_add(x, y, out=z), x=', x, 'is in id(x):', id(x))
print('after nd.elemwise_add(x, y, out=z), y=', y, 'is in id(y):', id(y))
print('after nd.elemwise_add(x, y, out=z), z=', z, 'is in id(z):', id(z))

Output

在执行上述程序后，您将获得以下结果−

On executing the above program, you will get the following result −

x=
[1. 2. 3. 4.]
<NDArray 4 @cpu(0)> is in id(x): 2438905791152
y=
[3. 4. 5. 5.]
<NDArray 4 @cpu(0)> is in id(y): 2438905685664
z=
[4. 6. 8. 9.]
<NDArray 4 @cpu(0)> is in id(z): 2438905790760
after nd.elemwise_add(x, y, out=z), x=
[1. 2. 3. 4.]
<NDArray 4 @cpu(0)> is in id(x): 2438905791152
after nd.elemwise_add(x, y, out=z), y=
[3. 4. 5. 5.]
<NDArray 4 @cpu(0)> is in id(y): 2438905685664
after nd.elemwise_add(x, y, out=z), z=
[4. 6. 8. 9.]
<NDArray 4 @cpu(0)> is in id(z): 2438905790760

NDArray Contexts

在 Apache MXNet 中，每个数组都有一个上下文，一个上下文可能是 CPU，而其他上下文可能是多个 GPU。当我们在多台服务器上部署工作时，情况甚至会变得更糟。这就是我们为什么需要智能地将数组分配给上下文的。它将最大程度地减少在设备间传输数据所花费的时间。

In Apache MXNet, each array has a context and one context could be the CPU, whereas other contexts might be several GPUs. The things can get even worst, when we deploy the work across multiple servers. That’s why, we need to assign arrays to contexts intelligently. It will minimise the time spent transferring data between devices.

例如，尝试按如下方式初始化一个数组−

For example, try initialising an array as follows −

from mxnet import nd
z = nd.ones(shape=(3,3), ctx=mx.cpu(0))
print(z)

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
<NDArray 3x3 @cpu(0)>

我们可以使用 copyto() 方法将给定的 NDArray 从一个上下文复制到另一个上下文，如下所示−

We can copy the given NDArray from one context to another context by using the copyto() method as follows −

x_gpu = x.copyto(gpu(0))
print(x_gpu)

NumPy array vs. NDArray

我们都熟悉 NumPy 数组，但 Apache MXNet 提供了自己的名为 NDArray 的数组实现。实际上，它最初被设计为类似于 NumPy，但有一个关键区别−

We all the familiar with NumPy arrays but Apache MXNet offers its own array implementation named NDArray. Actually, it was initially designed to be similar to NumPy but there is a key difference −

关键区别在于在 NumPy 和 NDArray 中执行计算的方式。在 MXNet 中执行的每个 NDArray 操作都是异步且非阻塞的，这意味着当我们编写 c = a * b 这样的代码时，该函数将被推送到 Execution Engine ，它将启动计算。

The key difference is in the way calculations are executed in NumPy and NDArray. Every NDArray manipulation in MXNet is done in asynchronous and non-blocking way, which means that, when we write code like c = a * b, the function is pushed to the Execution Engine, which will start the calculation.

这里，a 和 b 都是 NDArrays。使用它的好处是，该函数立即返回，并且用户线程可以继续执行，尽管前面提到的计算可能尚未完成。

Here, a and b both are NDArrays. The benefit of using it is that, the function immediately returns back, and the user thread can continue execution despite the fact that the previous calculation may not have been completed yet.

Working of Execution Engine

如果我们讨论执行引擎的工作原理，它将构建计算图。计算图可能会重新排序或组合一些计算，但它始终遵循依赖顺序。

If we talk about the working of execution engine, it builds the computation graph. The computation graph may reorder or combine some calculations, but it always honors dependency order.

例如，如果在编程代码后面对“X”进行了其他操作，则执行引擎将在“X”的结果可用后开始执行这些操作。执行引擎将为用户处理一些重要工作，例如编写回调来启动后续代码的执行。

For example, if there are other manipulation with ‘X’ done later in the programming code, the Execution Engine will start doing them once the result of ‘X’ is available. Execution engine will handle some important works for the users, such as writing of callbacks to start execution of subsequent code.

在 Apache MXNet 中，借助 NDArray，我们只需访问结果变量即可获得计算的结果。在将计算结果分配给结果变量之前，代码流将被阻塞。通过这种方式，它提高了代码性能，同时仍支持命令式编程模式。

In Apache MXNet, with the help of NDArray, to get the result of computation we only need to access the resulting variable. The flow of the code will be blocked until the computation results are assigned to the resulting variable. In this way, it increases code performance while still supporting imperative programming mode.

Converting NDArray to NumPy Array

让我们了解如何在 MXNet 中将 NDArray 转换为 NumPy 数组。

Let us learn how can we convert NDArray to NumPy Array in MXNet.

Combining higher-level operator with the help of few lower-level operators

有时，我们可以通过使用现有运算符来组装高级别的运算符。 np.full_like() 运算符就是其中的一个最佳示例，该运算符在 NDArray API 中不存在。它可以轻松地替换为现有运算符的组合，如下所示：

Sometimes, we can assemble a higher-level operator by using the existing operators. One of the best examples of this is, the np.full_like() operator, which is not there in NDArray API. It can easily be replaced with a combination of existing operators as follows:

from mxnet import nd
import numpy as np
np_x = np.full_like(a=np.arange(7, dtype=int), fill_value=15)
nd_x = nd.ones(shape=(7,)) * 15
np.array_equal(np_x, nd_x.asnumpy())

Output

我们将获得如下所示的输出：

We will get the output similar as follows −

True

Finding similar operator with different name and/or signature

在所有运算符中，有些运算符的名称略有不同，但在功能方面它们是相似的。 nd.ravel_index() 带 np.ravel() 函数就是一个示例。同样，某些运算符的名称可能相似，但它们的签名不同。 np.split() 和 nd.split() 是一个示例，它们是相似的。

Among all the operators, some of them have slightly different name, but they are similar in the terms of functionality. An example of this is nd.ravel_index() with np.ravel() functions. In the same way, some operators may have similar names, but they have different signatures. An example of this is np.split() and nd.split() are similar.

让我们通过以下编程示例来理解它：

Let’s understand it with the following programming example:

def pad_array123(data, max_length):
data_expanded = data.reshape(1, 1, 1, data.shape[0])
data_padded = nd.pad(data_expanded,
mode='constant',
pad_width=[0, 0, 0, 0, 0, 0, 0, max_length - data.shape[0]],
constant_value=0)
data_reshaped_back = data_padded.reshape(max_length)
return data_reshaped_back
pad_array123(nd.array([1, 2, 3]), max_length=10)

Output

输出如下 −

The output is stated below −

[1. 2. 3. 0. 0. 0. 0. 0. 0. 0.]
<NDArray 10 @cpu(0)>

Minimising impact of blocking calls

在某些情况下，我们必须使用 .asnumpy() 或 .asscalar() 方法，但这将强制 MXNet 阻塞执行，直到可以检索结果。我们可以通过在某些时刻调用 .asnumpy() 或 .asscalar() 方法来最大程度地减少阻塞调用的影响，我们认为此时已经完成了该值的计算。

In some of the cases, we have to use either .asnumpy() or .asscalar() methods, but this will force MXNet to block the execution, until the result can be retrieved. We can minimise the impact of a blocking call by calling .asnumpy() or .asscalar() methods in the moment, when we think the calculation of this value is already done.

Implementation Example

Example

from __future__ import print_function
import mxnet as mx
from mxnet import gluon, nd, autograd
from mxnet.ndarray import NDArray
from mxnet.gluon import HybridBlock
import numpy as np

class LossBuffer(object):
   """
   Simple buffer for storing loss value
   """

   def __init__(self):
      self._loss = None

   def new_loss(self, loss):
      ret = self._loss
      self._loss = loss
      return ret

      @property
      def loss(self):
         return self._loss

net = gluon.nn.Dense(10)
ce = gluon.loss.SoftmaxCELoss()
net.initialize()
data = nd.random.uniform(shape=(1024, 100))
label = nd.array(np.random.randint(0, 10, (1024,)), dtype='int32')
train_dataset = gluon.data.ArrayDataset(data, label)
train_data = gluon.data.DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=2)
trainer = gluon.Trainer(net.collect_params(), optimizer='sgd')
loss_buffer = LossBuffer()
for data, label in train_data:
   with autograd.record():
      out = net(data)
      # This call saves new loss and returns previous loss
      prev_loss = loss_buffer.new_loss(ce(out, label))
   loss_buffer.loss.backward()
   trainer.step(data.shape[0])
   if prev_loss is not None:
      print("Loss: {}".format(np.mean(prev_loss.asnumpy())))

Output

输出如下所示：

The output is cited below:

Loss: 2.3373236656188965
Loss: 2.3656985759735107
Loss: 2.3613128662109375
Loss: 2.3197104930877686
Loss: 2.3054862022399902
Loss: 2.329197406768799
Loss: 2.318927526473999

Apache MXNet - Gluon

另一个最重要的 MXNet Python 包是 Gluon。在本章中，我们将讨论此包。Gluon 为深度学习项目提供了一个清晰、简洁、简单的 API。它使 Apache MXNet 能够构建原型、构建和训练深度学习模型，同时不会牺牲训练速度。

Another most important MXNet Python package is Gluon. In this chapter, we will be discussing this package. Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed.

Blocks

Blocks 构成了更复杂网络设计的基础。在神经网络中，随着神经网络的复杂性增加，我们需要从设计单个神经元转向整个层。例如，诸如 ResNet-152 等神经网络设计通过包含 blocks 重复层具有非常公平的规律性。

Blocks form the basis of more complex network designs. In a neural network, as the complexity of neural network increases, we need to move from designing single to entire layers of neurons. For example, NN design like ResNet-152 have a very fair degree of regularity by consisting of blocks of repeated layers.

Example

在下面给出的示例中，我们将编写一个简单的块，即多层感知器的块。

In the example given below, we will write code a simple block, namely block for a multilayer perceptron.

from mxnet import nd
from mxnet.gluon import nn
x = nd.random.uniform(shape=(2, 20))
N_net = nn.Sequential()
N_net.add(nn.Dense(256, activation='relu'))
N_net.add(nn.Dense(10))
N_net.initialize()
N_net(x)

Output

这会生成以下输出：

This produces the following output:

[[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038
0.08696645 -0.0190793 -0.04122177 0.05088576]
[ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431
0.06786592 -0.06187843 -0.03436674 0.04234696]]
<NDArray 2x10 @cpu(0)>

从定义层到定义一个或多个层的块所需的步骤 −

Steps needed to go from defining layers to defining blocks of one or more layers −

Step 1 − 块将数据作为输入。

Step 1 − Block take the data as input.

Step 2 − 现在，块将以参数的形式存储状态。例如，在上述编码示例中，块包含两个隐藏层，我们需要一个地方来存储其参数。

Step 2 − Now, blocks will store the state in the form of parameters. For example, in the above coding example the block contains two hidden layers and we need a place to store parameters for it.

Step 3 − 下一个块将调用前向函数来执行前向传播。它也被称为前向计算。作为第一次前向调用的一个部分，块将以惰性方式初始化参数。

Step 3 − Next block will invoke the forward function to perform forward propagation. It is also called forward computation. As a part of first forward call, blocks initialize the parameters in a lazy fashion.

Step 4 − 最后，块将调用反向函数并计算相对于其输入的梯度。通常，此步骤会自动执行。

Step 4 − At last the blocks will invoke backward function and calculate the gradient with reference to their input. Typically, this step is performed automatically.

Sequential Block

顺序块是数据通过一系列块流过其中的特殊类型的块。其中，每个块应用于前一个块的输出，第一块应用于输入数据本身。

A sequential block is a special kind of block in which the data flows through a sequence of blocks. In this, each block applied to the output of one before with the first block being applied on the input data itself.

让我们看看 sequential 类如何工作 −

Let us see how sequential class works −

from mxnet import nd
from mxnet.gluon import nn
class MySequential(nn.Block):
   def __init__(self, **kwargs):
      super(MySequential, self).__init__(**kwargs)

   def add(self, block):
      self._children[block.name] = block
   def forward(self, x):
   for block in self._children.values():
      x = block(x)
   return x
x = nd.random.uniform(shape=(2, 20))
N_net = MySequential()
N_net.add(nn.Dense(256, activation
='relu'))
N_net.add(nn.Dense(10))
N_net.initialize()
N_net(x)

Output

输出与此一同给出 −

The output is given herewith −

[[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038
0.08696645 -0.0190793 -0.04122177 0.05088576]
[ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431
0.06786592 -0.06187843 -0.03436674 0.04234696]]
<NDArray 2x10 @cpu(0)>

Custom Block

我们可以轻松地通过如上定义的顺序块超越级联。但是，如果我们想要进行自定义，则 Block 类也为我们提供了所需的功能。块类具有由 nn 模块提供的模型构造函数。我们可以继承该模型构造函数来定义我们想要的模型。

We can easily go beyond concatenation with sequential block as defined above. But, if we would like to make customisations then the Block class also provides us the required functionality. Block class has a model constructor provided in nn module. We can inherit that model constructor to define the model we want.

在以下示例中， MLP class 覆盖了块类的 init 和正向函数。

In the following example, the MLP class overrides the init and forward functions of the Block class.

让我们看看它的工作原理。

Let us see how it works.

class MLP(nn.Block):

   def __init__(self, **kwargs):
      super(MLP, self).__init__(**kwargs)
      self.hidden = nn.Dense(256, activation='relu') # Hidden layer
      self.output = nn.Dense(10) # Output layer


   def forward(self, x):
      hidden_out = self.hidden(x)
      return self.output(hidden_out)
x = nd.random.uniform(shape=(2, 20))
N_net = MLP()
N_net.initialize()
N_net(x)

Output

当运行代码时，您将看到以下输出：

When you run the code, you will see the following output:

[[ 0.07787763 0.00216403 0.01682201 0.03059879 -0.00702019 0.01668715
0.04822846 0.0039432 -0.09300035 -0.04494302]
[ 0.08891078 -0.00625484 -0.01619131 0.0380718 -0.01451489 0.02006172
0.0303478 0.02463485 -0.07605448 -0.04389168]]
<NDArray 2x10 @cpu(0)>

Custom Layers

Apache MXNet 的 Gluon API 带有少量预定义层。但有时候，我们可能会发现需要一个新层。我们可以在 Gluon API 中轻松添加一个新层。在本节中，我们将看到我们如何从头开始创建一个新层。

Apache MXNet’s Gluon API comes with a modest number of pre-defined layers. But still at some point, we may find that a new layer is needed. We can easily add a new layer in Gluon API. In this section, we will see how we can create a new layer from scratch.

The Simplest Custom Layer

要在 Gluon API 中创建新层，我们必须创建一个从 Block 类继承的类，它提供了最基本的功能。我们可以直接或通过其他子类从它继承所有预定义的层。

To create a new layer in Gluon API, we must have to create a class inherits from the Block class which provides the most basic functionality. We can inherit all the pre-defined layers from it directly or via other subclasses.

要创建新层，只需要实现唯一的实例方法 forward (self, x) 。此方法定义了我们的层将在正向传播期间确切执行什么操作。如前所述，块的反向传播传递将由 Apache MXNet 本身自动完成。

For creating the new layer, the only instance method needed to be implemented is forward (self, x). This method defines, what exactly our layer is going to do during forward propagation. As discussed earlier also, the back-propagation pass for blocks will be done by Apache MXNet itself automatically.

Example

在下面的示例中，我们将定义一个新层。我们还将实现 forward() 方法，通过将输入数据拟合到 [0, 1] 的范围内来标准化输入数据。

In the example below, we will be defining a new layer. We will also implement forward() method to normalise the input data by fitting it into a range of [0, 1].

from __future__ import print_function
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon.nn import Dense
mx.random.seed(1)
class NormalizationLayer(gluon.Block):
   def __init__(self):
      super(NormalizationLayer, self).__init__()

   def forward(self, x):
      return (x - nd.min(x)) / (nd.max(x) - nd.min(x))
x = nd.random.uniform(shape=(2, 20))
N_net = NormalizationLayer()
N_net.initialize()
N_net(x)

Output

在执行上述程序后，您将获得以下结果−

On executing the above program, you will get the following result −

[[0.5216355 0.03835821 0.02284337 0.5945146 0.17334817 0.69329053
0.7782702 1. 0.5508242 0. 0.07058554 0.3677264
0.4366546 0.44362497 0.7192635 0.37616986 0.6728799 0.7032008

 0.46907538 0.63514024]
[0.9157533 0.7667402 0.08980197   0.03593295 0.16176797 0.27679572
 0.07331014 0.3905285 0.6513384 0.02713427 0.05523694 0.12147208
 0.45582628 0.8139887 0.91629887 0.36665893 0.07873632 0.78268915
 0.63404864 0.46638715]]
 <NDArray 2x20 @cpu(0)>

Hybridisation

它可以定义为 Apache MXNet 用于创建正向计算的符号图的过程。混合允许 MXNet 通过优化计算符号图来提高计算性能。实际上，我们可能会发现，在实现现有层时，块从 HybridBlock 继承，而不是直接继承自 Block 。

It may be defined as a process used by Apache MXNet’s to create a symbolic graph of a forward computation. Hybridisation allows MXNet to upsurge the computation performance by optimising the computational symbolic graph. Rather than directly inheriting from Block, in fact, we may find that while implementing existing layers a block inherits from a HybridBlock.

原因如下 −

Following are the reasons for this −

Allows us to write custom layers: HybridBlock allows us to write custom layers that can further be used in imperative and symbolic programming both.
Increase computation performance− HybridBlock optimise the computational symbolic graph which allows MXNet to increase computation performance.

Example

在此示例中，我们将通过使用 HybridBlock 重写我们上面创建的示例层：

In this example, we will be rewriting our example layer, created above, by using HybridBlock:

class NormalizationHybridLayer(gluon.HybridBlock):
   def __init__(self):
      super(NormalizationHybridLayer, self).__init__()

   def hybrid_forward(self, F, x):
      return F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))

layer_hybd = NormalizationHybridLayer()
layer_hybd(nd.array([1, 2, 3, 4, 5, 6], ctx=mx.cpu()))

Output

输出如下所示：

The output is stated below:

[0. 0.2 0.4 0.6 0.8 1. ]
<NDArray 6 @cpu(0)>

混合与 GPU 上的计算无关，人们可以在 CPU 和 GPU 上训练混合网络和非混合网络。

Hybridisation has nothing to do with computation on GPU and one can train hybridised as well as non-hybridised networks on both CPU and GPU.

Difference between Block and HybridBlock

如果我们将 Block 类与 HybridBlock 类进行比较，我们会看到 HybridBlock 已经实现了它的 forward() 方法。 HybridBlock 定义了在创建层时需要实现的 hybrid_forward() 方法。F 参数创建了 forward() 和 hybrid_forward() 之间的主要区别。在 MXNet 社区中，F 参数被称为后端。F 可以引用 mxnet.ndarray API （用于命令式编程）或 mxnet.symbol API （用于符号编程）。

If we will compare the Block Class and HybridBlock, we will see that HybridBlock already has its forward() method implemented. HybridBlock defines a hybrid_forward() method that needs to be implemented while creating the layers. F argument creates the main difference between forward() and hybrid_forward(). In MXNet community, F argument is referred to as a backend. F can either refer to mxnet.ndarray API (used for imperative programming) or mxnet.symbol API (used for Symbolic programming).

How to add custom layer to a network?

除了单独使用自定义层之外，这些层还与预定义的层一起使用。我们可以使用 Sequential 或 HybridSequential 容器来从顺序神经网络。如上所述， Sequential 容器分别继承自 Block 和 HybridSequential 继承自 HybridBlock 。

Instead of using custom layers separately, these layers are used with predefined layers. We can use either Sequential or HybridSequential containers to from a sequential neural network. As discussed earlier also, Sequential container inherit from Block and HybridSequential inherit from HybridBlock respectively.

Example

在下面的示例中，我们将创建一个具有自定义层并且简单的神经网络。 Dense (5) 层的输出将成为 NormalizationHybridLayer 的输入。 NormalizationHybridLayer 的输出将成为 Dense (1) 层的输入。

In the example below, we will be creating a simple neural network with a custom layer. The output from Dense (5) layer will be the input of NormalizationHybridLayer. The output of NormalizationHybridLayer will become the input of Dense (1) layer.

net = gluon.nn.HybridSequential()
with net.name_scope():
net.add(Dense(5))
net.add(NormalizationHybridLayer())
net.add(Dense(1))
net.initialize(mx.init.Xavier(magnitude=2.24))
net.hybridize()
input = nd.random_uniform(low=-10, high=10, shape=(10, 2))
net(input)

Output

您将看到以下输出 −

You will see the following output −

[[-1.1272651]
 [-1.2299833]
 [-1.0662932]
 [-1.1805027]
 [-1.3382034]
 [-1.2081106]
 [-1.1263978]
 [-1.2524893]

 [-1.1044774]

 [-1.316593 ]]
<NDArray 10x1 @cpu(0)>

Custom layer parameters

在神经网络中，一层具有一组与其关联的参数。有时我们称之为权重，它是一层的内部状态。这些参数起着不同的作用 −

In a neural network, a layer has a set of parameters associated with it. We sometimes refer them as weights, which is internal state of a layer. These parameters play different roles −

Sometimes these are the ones that we want to learn during backpropagation step.
Sometimes these are just constants we want to use during forward pass.

如果我们讨论编程概念，则这些参数（权重）通过 ParameterDict 类存储并访问，该类有助于其初始化、更新、保存和加载。

If we talk about the programming concept, these parameters (weights) of a block are stored and accessed via ParameterDict class which helps in initialisation, updation, saving, and loading of them.

Example

在下面的示例中，我们将定义以下两组参数 −

In the example below, we will be defining two following sets of parameters −

Parameter weights − This is trainable, and its shape is unknown during construction phase. It will be inferred on the first run of forward propagation.
Parameter scale − This is a constant whose value doesn’t change. As opposite to parameter weights, its shape is defined during construction.

class NormalizationHybridLayer(gluon.HybridBlock):
   def __init__(self, hidden_units, scales):
      super(NormalizationHybridLayer, self).__init__()
      with self.name_scope():
      self.weights = self.params.get('weights',
      shape=(hidden_units, 0),
      allow_deferred_init=True)
      self.scales = self.params.get('scales',
         shape=scales.shape,
         init=mx.init.Constant(scales.asnumpy()),
         differentiable=False)
      def hybrid_forward(self, F, x, weights, scales):
         normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)),
         (F.broadcast_sub(F.max(x), F.min(x))))
         weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
         scaled_data = F.broadcast_mul(scales, weighted_data)
return scaled_data

Apache MXNet - KVStore and Visualization

本章涉及 Python 软件包 KVStore 和可视化。

This chapter deals with the python packages KVStore and visualization.

KVStore package

KV 存储表示键值存储。这是用于多设备训练的关键组件。它很重要，因为通过一个或多个具有参数的 KVStore 服务器在单个计算机或多个计算机上的设备之间传输参数通信。

KV stores stands for Key-Value store. It is critical component used for multi-device training. It is important because, the communication of parameters across devices on single as well as across multiple machines is transmitted through one or more servers with a KVStore for the parameters.

让我们借助以下几点了解 KVStore 的工作原理：

Let us understand the working of KVStore with the help of following points:

Each value in KVStore is represented by a key and a value.
Each parameter array in the network is assigned a key and the weights of that parameter array is referred by value.
After that, the worker nodes push gradients after processing a batch. They also pull updated weights before processing a new batch.

简而言之，我们可以说 KVStore 是一个数据共享的地方，每个设备都可以将数据推入和拉出。

In simple words, we can say that KVStore is a place for data sharing where, each device can push data in and pull data out.

Data Push-In and Pull-Out

KVStore 可以被认为是在不同设备（如 GPU 和计算机）之间共享的单个对象，其中每个设备都能够将数据推入和拉出。

KVStore can be thought of as single object shared across different devices such as GPUs & computers, where each device is able to push data in and pull data out.

以下是由设备遵循以将数据推入和拉出的实施步骤：

Following are the implementation steps that needs to be followed by devices to push data in and pull data out:

Implementation steps

Initialisation − 第一步是初始化值。这里，对于我们的示例，我们将在 KVStrore 中初始化一个元组（int、NDArray），然后将值提取出来 −

Initialisation − First step is to initialise the values. Here for our example, we will be initialising a pair (int, NDArray) pair into KVStrore and after that pulling the values out −

import mxnet as mx
kv = mx.kv.create('local') # create a local KVStore.
shape = (3,3)
kv.init(3, mx.nd.ones(shape)*2)
a = mx.nd.zeros(shape)
kv.pull(3, out = a)
print(a.asnumpy())

Output

生成以下输出：

This produces the following output −

[[2. 2. 2.]
[2. 2. 2.]
[2. 2. 2.]]

Push, Aggregate, and Update − 初始化后，我们可以使用相同的形状将新值推入 KVStore 至键 −

Push, Aggregate, and Update − Once initialised, we can push a new value into KVStore with the same shape to the key −

kv.push(3, mx.nd.ones(shape)*8)
kv.pull(3, out = a)
print(a.asnumpy())

Output

输出如下 −

The output is given below −

[[8. 8. 8.]
 [8. 8. 8.]
 [8. 8. 8.]]

用于推送的数据可以存储在任何设备（如 GPU 或计算机）上。我们还可以将多个值推入同一键。在这种情况下，KVStore 将首先对所有这些值求和，然后推入聚合值，如下所示： −

The data used for pushing can be stored on any device such as GPUs or computers. We can also push multiple values into the same key. In this case, the KVStore will first sum all of these values and then push the aggregated value as follows −

contexts = [mx.cpu(i) for i in range(4)]
b = [mx.nd.ones(shape, ctx) for ctx in contexts]
kv.push(3, b)
kv.pull(3, out = a)
print(a.asnumpy())

Output

您将看到以下输出 −

You will see the following output −

[[4. 4. 4.]
 [4. 4. 4.]
 [4. 4. 4.]]

对于您应用的每个推送，KVStore 都会将已推送的值与已存储的值合并。它将借助于更新器完成。这里，默认更新器为 ASSIGN。

For each push you applied, KVStore will combine the pushed value with the value already stored. It will be done with the help of an updater. Here, the default updater is ASSIGN.

def update(key, input, stored):
   print("update on key: %d" % key)

   stored += input * 2
kv.set_updater(update)
kv.pull(3, out=a)
print(a.asnumpy())

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

[[4. 4. 4.]
 [4. 4. 4.]
 [4. 4. 4.]]

Example

kv.push(3, mx.nd.ones(shape))
kv.pull(3, out=a)
print(a.asnumpy())

Output

以下是代码的输出 −

Given below is the output of the code −

update on key: 3
[[6. 6. 6.]
 [6. 6. 6.]
 [6. 6. 6.]]

Pull − 与推送一样，我们还可以通过一次调用在多个设备上拉取值，如下所示 −

Pull − As like Push, we can also pull the value onto several devices with a single call as follows −

b = [mx.nd.ones(shape, ctx) for ctx in contexts]
kv.pull(3, out = b)
print(b[1].asnumpy())

Output

输出如下 −

The output is stated below −

[[6. 6. 6.]
 [6. 6. 6.]
 [6. 6. 6.]]

Complete Implementation Example

下面给出完整的实施示例 −

Given below is the complete implementation example −

import mxnet as mx
kv = mx.kv.create('local')
shape = (3,3)
kv.init(3, mx.nd.ones(shape)*2)
a = mx.nd.zeros(shape)
kv.pull(3, out = a)
print(a.asnumpy())
kv.push(3, mx.nd.ones(shape)*8)
kv.pull(3, out = a) # pull out the value
print(a.asnumpy())
contexts = [mx.cpu(i) for i in range(4)]
b = [mx.nd.ones(shape, ctx) for ctx in contexts]
kv.push(3, b)
kv.pull(3, out = a)
print(a.asnumpy())
def update(key, input, stored):
   print("update on key: %d" % key)
   stored += input * 2
kv._set_updater(update)
kv.pull(3, out=a)
print(a.asnumpy())
kv.push(3, mx.nd.ones(shape))
kv.pull(3, out=a)
print(a.asnumpy())
b = [mx.nd.ones(shape, ctx) for ctx in contexts]
kv.pull(3, out = b)
print(b[1].asnumpy())

Handling Key-Value Pairs

我们在上面实现的所有操作都涉及单个键，但 KVStore 还提供了一个 a list of key-value pairs 的接口 −

All the operations we have implemented above involves a single key, but KVStore also provides an interface for a list of key-value pairs −

For a single device

以下是一个示例，展示了 KVStore 接口，用于单个设备的一系列键值对 −

Following is an example to show an KVStore interface for a list of key-value pairs for a single device −

keys = [5, 7, 9]
kv.init(keys, [mx.nd.ones(shape)]*len(keys))
kv.push(keys, [mx.nd.ones(shape)]*len(keys))
b = [mx.nd.zeros(shape)]*len(keys)
kv.pull(keys, out = b)
print(b[1].asnumpy())

Output

您将收到以下输出 −

You will receive the following output −

update on key: 5
update on key: 7
update on key: 9
[[3. 3. 3.]
 [3. 3. 3.]
 [3. 3. 3.]]

For multiple device

以下是一个示例，展示了 KVStore 接口，用于多个设备的一系列键值对 −

Following is an example to show an KVStore interface for a list of key-value pairs for multiple device −

b = [[mx.nd.ones(shape, ctx) for ctx in contexts]] * len(keys)
kv.push(keys, b)
kv.pull(keys, out = b)
print(b[1][1].asnumpy())

Output

您将看到以下输出 −

You will see the following output −

update on key: 5
update on key: 7
update on key: 9
[[11. 11. 11.]
 [11. 11. 11.]
 [11. 11. 11.]]

Visualization package

可视化包是 Apache MXNet 包，用于将神经网络 (NN) 表示为包含节点和边的计算图。

Visualization package is Apache MXNet package used to represents the neural network (NN) as a computation graph that consists of nodes and edges.

Visualize neural network

在下面的示例中，我们将使用 mx.viz.plot_network 可视化神经网络。以下是其先决条件 −

In the example below we will use mx.viz.plot_network to visualize neural network. Followings are the prerequisites for this −

Prerequisites

Jupyter notebook
Graphviz library

Implementation Example

在下面的示例中，我们将可视化用于线性矩阵分解的样本 NN −

In the example below we will visualize a sample NN for linear matrix factorisation −

import mxnet as mx
user = mx.symbol.Variable('user')
item = mx.symbol.Variable('item')
score = mx.symbol.Variable('score')

# Set the dummy dimensions
k = 64
max_user = 100
max_item = 50

# The user feature lookup
user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)

# The item feature lookup
item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)

# predict by the inner product and then do sum
N_net = user * item
N_net = mx.symbol.sum_axis(data = N_net, axis = 1)
N_net = mx.symbol.Flatten(data = N_net)

# Defining the loss layer
N_net = mx.symbol.LinearRegressionOutput(data = N_net, label = score)

# Visualize the network
mx.viz.plot_network(N_net)

Apache MXNet - Python API ndarray

本章介绍了 Apache MXNet 中提供的 ndarray 库。

This chapter explains the ndarray library which is available in Apache MXNet.

Mxnet.ndarray

Apache MXNet 的 NDArray 库为所有数学计算定义了核心 DS（数据结构）。NDArray 的两个基本工作如下 −

Apache MXNet’s NDArray library defines the core DS (data structures) for all the mathematical computations. Two fundamental jobs of NDArray are as follows −

It supports fast execution on a wide range of hardware configurations.
It automatically parallelises multiple operations across available hardware.

下面给出的示例展示了如何使用常规 Python 列表通过 1-D 和 2-D“数组”创建 NDArray −

The example given below shows how one can create an NDArray by using 1-D and 2-D ‘array’ from a regular Python list −

import mxnet as mx
from mxnet import nd

x = nd.array([1,2,3,4,5,6,7,8,9,10])
print(x)

Output

输出如下：

The output is given below:

[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
<NDArray 10 @cpu(0)>

Example

y = nd.array([[1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10]])
print(y)

Output

生成以下输出：

This produces the following output −

[[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
 [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
 [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]]
<NDArray 3x10 @cpu(0)>

现在，让我们详细讨论 MXNet 的 ndarray API 的类、函数和参数。

Now let us discuss in detail about the classes, functions, and parameters of ndarray API of MXNet.

Classes

下表包括 MXNet 的 ndarray API 的类 −

Following table consists of the classes of ndarray API of MXNet −

Class

Definition

CachedOp(sym[, flags])

It is used for Cached operator handle.

NDArray(handle[, writable])

It is used as an array object that represents a multi-dimensional, homogeneous array of fixed-size items.

Functions and their parameters

以下是一些由 mxnet.ndarray API 涵盖的重要函数及其参数 −

Following are some of the important functions and their parameters covered by mxnet.ndarray API −

Function & its Parameters

Definition

Activation([data, act_type, out, name])

It applies an activation function element-wise to the input. It supports relu, sigmoid, tanh, softrelu, softsign activation functions.

BatchNorm([data, gamma, beta, moving_mean, …])

It is used for batch normalisation. This function normalises a data batch by mean and variance. It applies a scale gamma and offset beta.

BilinearSampler([data, grid, cudnn_off, …])

This function applies bilinear sampling to input feature map. Actually it is the key of “Spatial Transformer Networks”. If you are familiar with remap function in OpenCV, the usage of this function is quite similar to that. The only difference is that it has the backward pass.

BlockGrad([data, out, name])

As name specifies, this function stops gradient computation. It basically stops the accumulated gradient of the inputs from flowing through this operator in backward direction.

cast([data, dtype, out, name])

This function will cast all elements of the input to a new type.

Implementation Examples

在下面的示例中，我们将使用函数 BilinierSampler() 将数据缩小两倍，并将数据水平偏移 -1 像素−

In the example below, we will be using the function BilinierSampler() for zooming out the data two times and shifting the data horizontally by -1 pixel −

import mxnet as mx
from mxnet import nd
data = nd.array([[[[2, 5, 3, 6],
   [1, 8, 7, 9],
   [0, 4, 1, 8],
   [2, 0, 3, 4]]]])
affine_matrix = nd.array([[2, 0, 0],
   [0, 2, 0]])

affine_matrix = nd.reshape(affine_matrix, shape=(1, 6))

grid = nd.GridGenerator(data=affine_matrix, transform_type='affine', target_shape=(4, 4))

output = nd.BilinearSampler(data, grid)

Output

执行以上代码后，您应该看到以下输出：

When you execute the above code, you should see the following output:

[[[[0. 0. 0. 0. ]
   [0. 4.0000005 6.25 0. ]
   [0. 1.5 4. 0. ]
   [0. 0. 0. 0. ]]]]
<NDArray 1x1x4x4 @cpu(0)>

以上输出显示将数据缩小两倍。

The above output shows the zooming out of data two times.

将数据偏移 -1 像素的示例如下 −

Example of shifting the data by -1 pixel is as follows −

import mxnet as mx
from mxnet import nd
data = nd.array([[[[2, 5, 3, 6],
   [1, 8, 7, 9],
   [0, 4, 1, 8],
   [2, 0, 3, 4]]]])
warp_matrix = nd.array([[[[1, 1, 1, 1],
   [1, 1, 1, 1],
   [1, 1, 1, 1],
   [1, 1, 1, 1]],
   [[0, 0, 0, 0],
   [0, 0, 0, 0],
   [0, 0, 0, 0],
   [0, 0, 0, 0]]]])
grid = nd.GridGenerator(data=warp_matrix, transform_type='warp')
output = nd.BilinearSampler(data, grid)

Output

输出如下 −

The output is stated below −

[[[[5. 3. 6. 0.]
[8. 7. 9. 0.]
[4. 1. 8. 0.]
[0. 3. 4. 0.]]]]
<NDArray 1x1x4x4 @cpu(0)>

同样，以下示例显示了 cast() 函数的使用 −

Similarly, following example shows the use of cast() function −

nd.cast(nd.array([300, 10.1, 15.4, -1, -2]), dtype='uint8')

Output

执行后，您将收到以下输出：

Upon execution, you will receive the following output −

[ 44 10 15 255 254]
<NDArray 5 @cpu(0)>

ndarray.contrib

Contrib NDArray API 在 ndarray.contrib 包中定义。它通常为新特性提供许多有用的实验性 API。此 API 作为社区尝试新特性的场所。此功能的贡献者也将获得反馈。

The Contrib NDArray API is defined in the ndarray.contrib package. It typically provides many useful experimental APIs for new features. This API works as a place for the community where they can try out the new features. The feature contributor will get the feedback as well.

Functions and their parameters

以下是 mxnet.ndarray.contrib API 涉及的一些重要函数及其参数 -

Following are some of the important functions and their parameters covered by mxnet.ndarray.contrib API −

Function & its Parameters

Definition

rand_zipfian(true_classes, num_sampled, …)

This function draws random samples from an approximately Zipfian distribution. The base distribution of this function is Zipfian distribution. This function randomly samples num_sampled candidates and the elements of sampled_candidates are drawn from the base distribution given above.

foreach(body, data, init_states)

As name implies, this function runs a for loop with user-defined computation over NDArrays on dimension 0. This function simulates a for loop and body has the computation for an iteration of the for loop.

while_loop(cond, func, loop_vars[, …])

As name implies, this function runs a while loop with user-defined computation and loop condition. This function simulates a while loop that literately does customized computation if the condition is satisfied.

cond(pred, then_func, else_func)

As name implies, this function run an if-then-else using user-defined condition and computation. This function simulates an if-like branch which chooses to do one of the two customised computations according to the specified condition.

isinf(data)

This function performs an element-wise check to determine if the NDArray contains an infinite element or not.

getnnz([data, axis, out, name])

This function gives us the number of stored values for a sparse tensor. It also includes explicit zeros. It only supports CSR matrix on CPU.

requantize([data, min_range, max_range, …])

This function requantise the given data that is quantised in int32 and the corresponding thresholds, into int8 using min and max thresholds either calculated at runtime or from calibration.

Implementation Examples

在下面的示例中，我们将使用 rand_zipfian 函数从近似 Zipfian 分布中抽取随机样本 -

In the example below, we will be using the function rand_zipfian for drawing random samples from an approximately Zipfian distribution −

import mxnet as mx
from mxnet import nd
trueclass = mx.nd.array([2])
samples, exp_count_true, exp_count_sample = mx.nd.contrib.rand_zipfian(trueclass, 3, 4)
samples

Output

您将看到以下输出 −

You will see the following output −

[0 0 1]
<NDArray 3 @cpu(0)>

Example

exp_count_true

Output

输出如下：

The output is given below:

[0.53624076]
<NDArray 1 @cpu(0)>

Example

exp_count_sample

Output

这会生成以下输出：

This produces the following output:

[1.29202967 1.29202967 0.75578891]
<NDArray 3 @cpu(0)>

在下面的示例中，我们将使用 while_loop 函数运行一个 while 循环，用于用户定义的计算和循环条件：

In the example below, we will be using the function while_loop for running a while loop for user-defined computation and loop condition:

cond = lambda i, s: i <= 7
func = lambda i, s: ([i + s], [i + 1, s + i])
loop_var = (mx.nd.array([0], dtype="int64"), mx.nd.array([1], dtype="int64"))
outputs, states = mx.nd.contrib.while_loop(cond, func, loop_vars, max_iterations=10)
outputs

Output

输出如下所示−

The output is shown below −

[
[[       1]
 [      2]
 [      4]
 [      7]
 [     11]
 [     16]
 [     22]
 [     29]
 [3152434450384]
 [     257]]
<NDArray 10x1 @cpu(0)>]

Example

States

Output

生成以下输出：

This produces the following output −

[
[8]
<NDArray 1 @cpu(0)>,
[29]
<NDArray 1 @cpu(0)>]

ndarray.image

Image NDArray API 在 ndarray.image 包中进行定义。正如名称所示，它通常用于图像及其特征。

The Image NDArray API is defined in the ndarray.image package. As name implies, it typically used for images and their features.

Functions and their parameters

以下是一些 mxnet.ndarray.image API 涵盖的重要函数及其参数 -

Following are some of the important functions & their parameters covered by mxnet.ndarray.image API−

Function & its Parameters

Definition

adjust_lighting([data, alpha, out, name])

As name implies, this function adjusts the lighting level of the input. It follows the AlexNet style.

crop([data, x, y, width, height, out, name])

With the help of this function, we can crop an image NDArray of shape (H x W x C) or (N x H x W x C) to the size given by user.

normalize([data, mean, std, out, name])

It will normalise a tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation(SD).

random_crop([data, xrange, yrange, width, …])

Similar to crop(), it randomly crop an image NDArray of shape (H x W x C) or (N x H x W x C) to the size given by the user. It will upsample the result if src is smaller than the size.

random_lighting([data, alpha_std, out, name])

As name implies, this function adds the PCA noise randomly. It also follows the AlexNet style.

random_resized_crop([data, xrange, yrange, …])

It also crops an image randomly NDArray of shape (H x W x C) or (N x H x W x C) to the given size. It will upsample the result, if src is smaller than the size. It will randomise the area and aspect ration as well.

resize([data, size, keep_ratio, interp, …])

As name implies, this function will resize an image NDArray of shape (H x W x C) or (N x H x W x C) to the size given by user.

to_tensor([data, out, name])

It converts an image NDArray of shape (H x W x C) or (N x H x W x C) with the values in the range [0, 255] to a tensor NDArray of shape (C x H x W) or (N x C x H x W) with the values in the range [0, 1].

Implementation Examples

在下面的示例中，我们将使用 to_tensor 函数将值在 [0, 255] 范围内的形状为 (H x W x C) 或 (N x H x W x C) 的图像 NDArray 转换为值在 [0, 1] 范围内的形状为 (C x H x W) 或 (N x C x H x W) 的张量 NDArray。

In the example below, we will be using the function to_tensor to convert image NDArray of shape (H x W x C) or (N x H x W x C) with the values in the range [0, 255] to a tensor NDArray of shape (C x H x W) or (N x C x H x W) with the values in the range [0, 1].

import numpy as np
img = mx.nd.random.uniform(0, 255, (4, 2, 3)).astype(dtype=np.uint8)
mx.nd.image.to_tensor(img)

Output

您将看到以下输出 −

You will see the following output −

[[[0.972549 0.5058824 ]
   [0.6039216 0.01960784]
   [0.28235295 0.35686275]
   [0.11764706 0.8784314 ]]

[[0.8745098 0.9764706 ]
   [0.4509804 0.03529412]
   [0.9764706 0.29411766]
   [0.6862745 0.4117647 ]]

[[0.46666667 0.05490196]
   [0.7372549 0.4392157 ]
   [0.11764706 0.47843137]
   [0.31764707 0.91764706]]]
<NDArray 3x4x2 @cpu(0)>

Example

img = mx.nd.random.uniform(0, 255, (2, 4, 2, 3)).astype(dtype=np.uint8)

mx.nd.image.to_tensor(img)

Output

运行代码后，你将看到以下输出 −

When you run the code, you will see the following output −

[[[[0.0627451 0.5647059 ]
[0.2627451 0.9137255 ]
[0.57254905 0.27450982]
[0.6666667 0.64705884]]
[[0.21568628 0.5647059 ]
[0.5058824 0.09019608]
[0.08235294 0.31764707]
[0.8392157 0.7137255 ]]
[[0.6901961 0.8627451 ]
[0.52156866 0.91764706]
[0.9254902 0.00784314]
[0.12941177 0.8392157 ]]]
[[[0.28627452 0.39607844]
[0.01960784 0.36862746]
[0.6745098 0.7019608 ]
[0.9607843 0.7529412 ]]
[[0.2627451 0.58431375]
[0.16470589 0.00392157]
[0.5686275 0.73333335]
[0.43137255 0.57254905]]
[[0.18039216 0.54901963]
[0.827451 0.14509805]
[0.26666668 0.28627452]
[0.24705882 0.39607844]]]]
<NDArgt;ray 2x3x4x2 @cpu(0)>

在下面的示例中，我们将使用函数 normalize 使用 mean 和 standard deviation(SD) 标准化形状为 (C x H x W) 或 (N x C x H x W) 的张量。

In the example below, we will be using the function normalize to normalise a tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation(SD).

img = mx.nd.random.uniform(0, 1, (3, 4, 2))

mx.nd.image.normalize(img, mean=(0, 1, 2), std=(3, 2, 1))

Output

生成以下输出：

This produces the following output −

[[[ 0.29391178 0.3218054 ]
[ 0.23084386 0.19615503]
[ 0.24175143 0.21988946]
[ 0.16710812 0.1777354 ]]
[[-0.02195817 -0.3847335 ]
[-0.17800489 -0.30256534]
[-0.28807247 -0.19059572]
[-0.19680339 -0.26256624]]
[[-1.9808068 -1.5298678 ]
[-1.6984252 -1.2839255 ]
[-1.3398265 -1.712009 ]
[-1.7099224 -1.6165378 ]]]
<NDArray 3x4x2 @cpu(0)>

Example

img = mx.nd.random.uniform(0, 1, (2, 3, 4, 2))

mx.nd.image.normalize(img, mean=(0, 1, 2), std=(3, 2, 1))

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

[[[[ 2.0600514e-01 2.4972327e-01]
[ 1.4292289e-01 2.9281738e-01]
[ 4.5158025e-02 3.4287784e-02]
[ 9.9427439e-02 3.0791296e-02]]
[[-2.1501756e-01 -3.2297665e-01]
[-2.0456362e-01 -2.2409186e-01]
[-2.1283737e-01 -4.8318747e-01]
[-1.7339960e-01 -1.5519112e-02]]
[[-1.3478968e+00 -1.6790028e+00]
[-1.5685816e+00 -1.7787373e+00]
[-1.1034534e+00 -1.8587360e+00]
[-1.6324382e+00 -1.9027401e+00]]]
[[[ 1.4528830e-01 3.2801408e-01]
[ 2.9730779e-01 8.6780310e-02]
[ 2.6873133e-01 1.7900752e-01]
[ 2.3462953e-01 1.4930873e-01]]
[[-4.4988656e-01 -4.5021546e-01]
[-4.0258706e-02 -3.2384416e-01]
[-1.4287934e-01 -2.6537544e-01]
[-5.7649612e-04 -7.9429924e-02]]
[[-1.8505517e+00 -1.0953522e+00]
[-1.1318740e+00 -1.9624406e+00]
[-1.8375070e+00 -1.4916846e+00]
[-1.3844404e+00 -1.8331525e+00]]]]
<NDArray 2x3x4x2 @cpu(0)>

ndarray.random

Random NDArray API 在 ndarray.random 包中定义。顾名思义，它是 MXNet 的随机分布生成器 NDArray API。

The Random NDArray API is defined in the ndarray.random package. As name implies, it is random distribution generator NDArray API of MXNet.

Functions and their parameters

以下是 mxnet.ndarray.random API 涵盖的一些重要函数及其参数 −

Following are some of the important functions and their parameters covered by mxnet.ndarray.random API −

Function and its Parameters

Definition

uniform([low, high, shape, dtype, ctx, out])

It generates random samples from a uniform distribution.

normal([loc, scale, shape, dtype, ctx, out])

It generates random samples from a normal (Gaussian) distribution.

randn(*shape, **kwargs)

It generates random samples from a normal (Gaussian) distribution.

exponential([scale, shape, dtype, ctx, out])

It generates samples from an exponential distribution.

gamma([alpha, beta, shape, dtype, ctx, out])

It generates random samples from a gamma distribution.

multinomial(data[, shape, get_prob, out, dtype])

It generates concurrent sampling from multiple multinomial distributions.

negative_binomial([k, p, shape, dtype, ctx, out])

It generates random samples from a negative binomial distribution.

generalized_negative_binomial([mu, alpha, …])

It generates random samples from a generalised negative binomial distribution.

shuffle(data, **kwargs)

It shuffles the elements randomly.

randint(low, high[, shape, dtype, ctx, out])

It generates random samples from a discrete uniform distribution.

exponential_like([data, lam, out, name])

It generates random samples from an exponential distribution according to the input array shape.

gamma_like([data, alpha, beta, out, name])

It generates random samples from a gamma distribution according to the input array shape.

generalized_negative_binomial_like([data, …])

It generates random samples from a generalised negative binomial distribution, according to the input array shape.

negative_binomial_like([data, k, p, out, name])

It generates random samples from a negative binomial distribution, according to the input array shape.

normal_like([data, loc, scale, out, name])

It generates random samples from a normal (Gaussian) distribution, according to the input array shape.

poisson_like([data, lam, out, name])

It generates random samples from a Poisson distribution, according to the input array shape.

uniform_like([data, low, high, out, name])

It generates random samples from a uniform distribution,according to the input array shape.

Implementation Examples

在以下示例中，我们将从均匀分布中绘制随机样本。这将使用函数 uniform() 。

In the example below, we are going to draw random samples from a uniform distribution. For this will be using the function uniform().

mx.nd.random.uniform(0, 1)

Output

输出如下：

The output is mentioned below −

[0.12381998]
<NDArray 1 @cpu(0)>

Example

mx.nd.random.uniform(-1, 1, shape=(2,))

Output

输出如下 −

The output is given below −

[0.558102 0.69601643]
<NDArray 2 @cpu(0)>

Example

low = mx.nd.array([1,2,3])
high = mx.nd.array([2,3,4])
mx.nd.random.uniform(low, high, shape=2)

Output

您将看到以下输出 −

You will see the following output −

[[1.8649333 1.8073189]
 [2.4113967 2.5691009]
 [3.1399727 3.4071832]]
<NDArray 3x2 @cpu(0)>

在以下示例中，我们将从广义负二项分布中绘制随机样本。为此，我们将使用函数 generalized_negative_binomial() 。

In the example below, we are going to draw random samples from a generalized negative binomial distribution. For this, we will be using the function generalized_negative_binomial().

mx.nd.random.generalized_negative_binomial(10, 0.5)

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

[1.]
<NDArray 1 @cpu(0)>

Example

mx.nd.random.generalized_negative_binomial(10, 0.5, shape=(2,))

Output

输出与此一同给出 −

The output is given herewith −

[16. 23.]
<NDArray 2 @cpu(0)>

Example

mu = mx.nd.array([1,2,3])
alpha = mx.nd.array([0.2,0.4,0.6])
mx.nd.random.generalized_negative_binomial(mu, alpha, shape=2)

Output

以下是代码的输出 −

Given below is the output of the code −

[[0. 0.]
 [4. 1.]
 [9. 3.]]
<NDArray 3x2 @cpu(0)>

ndarray.utils

实用 NDArray API 在 ndarray.utils 包中定义。顾名思义，它为 NDArray 和 BaseSparseNDArray 提供实用函数。

The utility NDArray API is defined in the ndarray.utils package. As name implies, it provides the utility functions for NDArray and BaseSparseNDArray.

Functions and their parameters

以下是由 mxnet.ndarray.utils API 涵盖的部分重要函数及其参数 -

Following are some of the important functions and their parameters covered by mxnet.ndarray.utils API −

Function and its Parameters

Definition

zeros(shape[, ctx, dtype, stype])

This function will return a new array of given shape and type, filled with zeros.

empty(shape[, ctx, dtype, stype])

It will returns a new array of given shape and type, without initialising entries.

array(source_array[, ctx, dtype])

As name implies, this function will create an array from any object exposing the array interface.

load(fname)

It will load an array from file.

load_frombuffer(buf)

As name implies, this function will load an array dictionary or list from a buffer

save(fname, data)

This function will save a list of arrays or a dict of str→array to file.

Implementation Examples

在下面的例子中，我们将返回一个以零填充的新数组，该数组具有给定的形状和类型。为此，我们将使用函数 [s0]。

In the example below, we are going to return a new array of given shape and type, filled with zeros. For this, we will be using the function zeros().

mx.nd.zeros((1,2), mx.cpu(), stype='csr')

Output

生成以下输出：

This produces the following output −

<CSRNDArray 1x2 @cpu(0)>

Example

mx.nd.zeros((1,2), mx.cpu(), 'float16', stype='row_sparse').asnumpy()

Output

您将收到以下输出 −

You will receive the following output −

array([[0., 0.]], dtype=float16)

在下面的例子中，我们将保存一个数组列表和一个字符串词典。为此，我们将使用函数 [s1]。

In the example below, we are going to save a list of arrays and a dictionary of strings. For this, we will be using the function save().

Example

x = mx.nd.zeros((2,3))
y = mx.nd.ones((1,4))
mx.nd.save('list', [x,y])
mx.nd.save('dict', {'x':x, 'y':y})
mx.nd.load('list')

Output

执行后，您将收到以下输出：

Upon execution, you will receive the following output −

[
[[0. 0. 0.]
[0. 0. 0.]]
<NDArray 2x3 @cpu(0)>,
[[1. 1. 1. 1.]]
<NDArray 1x4 @cpu(0)>]

Example

mx.nd.load('my_dict')

Output

输出如下所示−

The output is shown below −

{'x':
[[0. 0. 0.]
[0. 0. 0.]]
<NDArray 2x3 @cpu(0)>, 'y':
[[1. 1. 1. 1.]]
<NDArray 1x4 @cpu(0)>}

Apache MXNet - Python API gluon

正如我们在前几章中已经讨论的那样，MXNet Gluon 为 DL 项目提供了一个清晰、简洁且简单的 API。它使 Apache MXNet 能够对 DL 模型进行原型制作、构建和训练，而不会影响训练速度。

As we have already discussed in previous chapters that, MXNet Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed.

Core Modules

让我们了解 Apache MXNet Python 应用程序编程接口 (API) gluon 的核心模块。

Let us learn the core modules of Apache MXNet Python application programming interface (API) gluon.

gluon.nn

Gluon 在 gluon.nn 模块中提供了大量的内置 NN 层。这就是称其为核心模块的原因。

Gluon provides a large number of build-in NN layers in gluon.nn module. That is the reason it is called the core module.

Methods and their parameters

以下是一些 mxnet.gluon.nn 核心模块涵盖的重要方法及其参数——

Following are some of the important methods and their parameters covered by mxnet.gluon.nn core module −

Methods and its Parameters

Definition

Activation(activation, **kwargs)

As name implies, this method applies an activation function to input.

AvgPool1D([pool_size, strides, padding, …])

This is average pooling operation for temporal data.

AvgPool2D([pool_size, strides, padding, …])

This is average pooling operation for spatial data.

AvgPool3D([pool_size, strides, padding, …])

This is Average pooling operation for 3D data. The data can be spatial or spatio-temporal.

BatchNorm([axis, momentum, epsilon, center, …])

It represents batch normalisation layer.

BatchNormReLU([axis, momentum, epsilon, …])

It also represents batch normalisation layer but with Relu activation function.

Block([prefix, params])

It gives the base class for all neural network layers and models.

Conv1D(channels, kernel_size[, strides, …])

This method is used for 1-D convolution layer. For example, temporal convolution.

Conv1DTranspose(channels, kernel_size[, …])

This method is used for Transposed 1D convolution layer.

Conv2D(channels, kernel_size[, strides, …])

This method is used for 2D convolution layer. For example, spatial convolution over images).

Conv2DTranspose(channels, kernel_size[, …])

This method is used for Transposed 2D convolution layer.

Conv3D(channels, kernel_size[, strides, …])

This method is used for 3D convolution layer. For example, spatial convolution over volumes.

Conv3DTranspose(channels, kernel_size[, …])

This method is used for Transposed 3D convolution layer.

Dense(units[, activation, use_bias, …])

This method represents for your regular densely-connected NN layer.

Dropout(rate[, axes])

As name implies, the method applies Dropout to the input.

ELU([alpha])

This method is used for Exponential Linear Unit (ELU).

Embedding(input_dim, output_dim[, dtype, …])

It turns non-negative integers into dense vectors of fixed size.

Flatten(**kwargs)

This method flattens the input to 2-D.

GELU(**kwargs)

This method is used for Gaussian Exponential Linear Unit (GELU).

GlobalAvgPool1D([layout])

With the help of this method, we can do global average pooling operation for temporal data.

GlobalAvgPool2D([layout])

With the help of this method, we can do global average pooling operation for spatial data.

GlobalAvgPool3D([layout])

With the help of this method, we can do global average pooling operation for 3-D data.

GlobalMaxPool1D([layout])

With the help of this method, we can do global max pooling operation for 1-D data.

GlobalMaxPool2D([layout])

With the help of this method, we can do global max pooling operation for 2-D data.

GlobalMaxPool3D([layout])

With the help of this method, we can do global max pooling operation for 3-D data.

GroupNorm([num_groups, epsilon, center, …])

This method applies group normalization to the n-D input array.

HybridBlock([prefix, params])

This method supports forwarding with both Symbol and NDArray.

HybridLambda(function[, prefix])

With the help of this method we can wrap an operator or an expression as a HybridBlock object.

HybridSequential([prefix, params])

It stacks HybridBlocks sequentially.

InstanceNorm([axis, epsilon, center, scale, …])

This method applies instance normalisation to the n-D input array.

Implementation Examples

在下面的示例中，我们将使用 Block()，它为所有神经网络层和模型提供基本类。

In the example below, we are going to use Block() which gives the base class for all neural network layers and models.

from mxnet.gluon import Block, nn
class Model(Block):
   def __init__(self, **kwargs):
      super(Model, self).__init__(**kwargs)
      # use name_scope to give child Blocks appropriate names.
      with self.name_scope():
         self.dense0 = nn.Dense(20)
         self.dense1 = nn.Dense(20)
   def forward(self, x):

      x = mx.nd.relu(self.dense0(x))
      return mx.nd.relu(self.dense1(x))

model = Model()
model.initialize(ctx=mx.cpu(0))
model(mx.nd.zeros((5, 5), ctx=mx.cpu(0)))

Output

您将看到以下输出 −

You will see the following output −

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
<NDArray 5x20 @cpu(0)*gt;

在下面的示例中，我们将使用 HybridBlock()，它支持 Symbol 和 NDArray 进行正向传播。

In the example below, we are going to use HybridBlock() that supports forwarding with both Symbol and NDArray.

import mxnet as mx
from mxnet.gluon import HybridBlock, nn


class Model(HybridBlock):
   def __init__(self, **kwargs):
      super(Model, self).__init__(**kwargs)
      # use name_scope to give child Blocks appropriate names.
      with self.name_scope():
         self.dense0 = nn.Dense(20)
         self.dense1 = nn.Dense(20)

   def forward(self, x):
      x = nd.relu(self.dense0(x))
      return nd.relu(self.dense1(x))
model = Model()
model.initialize(ctx=mx.cpu(0))

model.hybridize()
model(mx.nd.zeros((5, 5), ctx=mx.cpu(0)))

Output

输出如下：

The output is mentioned below −

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
<NDArray 5x20 @cpu(0)>

gluon.rnn

Gluon 在 gluon.rnn 模块中提供了大量内置 recurrent neural network （RNN）层。那是它被称作核心模块的原因。

Gluon provides a large number of build-in recurrent neural network (RNN) layers in gluon.rnn module. That is the reason, it is called the core module.

Methods and their parameters

以下是 mxnet.gluon.nn 核心模块涵盖的一些重要方法及其参数：

Following are some of the important methods and their parameters covered by mxnet.gluon.nn core module:

Methods and its Parameters

Definition

BidirectionalCell(l_cell, r_cell[, …])

It is used for Bidirectional Recurrent Neural Network (RNN) cell.

DropoutCell(rate[, axes, prefix, params])

This method will apply dropout on the given input.

GRU(hidden_size[, num_layers, layout, …])

It applies a multi-layer gated recurrent unit (GRU) RNN to a given input sequence.

GRUCell(hidden_size[, …])

It is used for Gated Rectified Unit (GRU) network cell.

HybridRecurrentCell([prefix, params])

This method supports hybridize.

HybridSequentialRNNCell([prefix, params])

With the help of this method we can sequentially stack multiple HybridRNN cells.

LSTM(hidden_size[, num_layers, layout, …])0

It applies a multi-layer long short-term memory (LSTM) RNN to a given input sequence.

LSTMCell(hidden_size[, …])

It is used for Long-Short Term Memory (LSTM) network cell.

ModifierCell(base_cell)

It is the Base class for modifier cells.

RNN(hidden_size[, num_layers, activation, …])

It applies a multi-layer Elman RNN with tanh or ReLU non-linearity to a given input sequence.

RNNCell(hidden_size[, activation, …])

It is used for Elman RNN recurrent neural network cell.

RecurrentCell([prefix, params])

It represents the abstract base class for RNN cells.

SequentialRNNCell([prefix, params])

With the help of this method we can sequentially stack multiple RNN cells.

ZoneoutCell(base_cell[, zoneout_outputs, …])

This method applies Zoneout on the base cell.

Implementation Examples

在下例中，我们将使用 GRU()，它对给定的输入序列应用多层门控循环单元 (GRU) RNN。

In the example below, we are going to use GRU() which applies a multi-layer gated recurrent unit (GRU) RNN to a given input sequence.

layer = mx.gluon.rnn.GRU(100, 3)
layer.initialize()
input_seq = mx.nd.random.uniform(shape=(5, 3, 10))
out_seq = layer(input_seq)
h0 = mx.nd.random.uniform(shape=(3, 3, 100))
out_seq, hn = layer(input_seq, h0)
out_seq

Output

生成以下输出：

This produces the following output −

[[[ 1.50152072e-01 5.19012511e-01 1.02390535e-01 ... 4.35803324e-01
1.30406499e-01 3.30152437e-02]
[ 2.91542172e-01 1.02243155e-01 1.73325196e-01 ... 5.65296151e-02
1.76546033e-02 1.66693389e-01]
[ 2.22257316e-01 3.76294643e-01 2.11277917e-01 ... 2.28903517e-01
3.43954474e-01 1.52770668e-01]]


[[ 1.40634328e-01 2.93247789e-01 5.50393537e-02 ... 2.30207980e-01
6.61415309e-02 2.70989928e-02]
[ 1.11081995e-01 7.20834285e-02 1.08342394e-01 ... 2.28330195e-02
6.79589901e-03 1.25501186e-01]
[ 1.15944080e-01 2.41565228e-01 1.18612610e-01 ... 1.14908054e-01
1.61080107e-01 1.15969211e-01]]
………………………….

Example

hn

Output

生成以下输出：

This produces the following output −

[[[-6.08105101e-02 3.86217088e-02   6.64453954e-03 8.18805695e-02
3.85607071e-02 -1.36945639e-02 7.45836645e-03 -5.46515081e-03
9.49622393e-02 6.39371723e-02 -6.37890724e-03 3.82240303e-02
9.11015049e-02 -2.01375950e-02 -7.29381144e-02 6.93765879e-02
2.71829776e-02 -6.64435029e-02 -8.45306814e-02 -1.03075653e-01
6.72040805e-02 -7.06537142e-02 -3.93818803e-02 5.16211614e-03
-4.79770005e-02 1.10734522e-01 1.56721435e-02 -6.93409378e-03
1.16915874e-01 -7.95962065e-02 -3.06530762e-02 8.42394680e-02
7.60370195e-02 2.17055440e-01 9.85361822e-03 1.16660878e-01
4.08297703e-02 1.24978097e-02 8.25245082e-02 2.28673983e-02
-7.88266212e-02 -8.04114193e-02 9.28791538e-02 -5.70827350e-03
-4.46166918e-02 -6.41122833e-02 1.80885363e-02 -2.37745279e-03
4.37298454e-02 1.28888980e-01 -3.07202265e-02 2.50503756e-02
4.00907174e-02 3.37077095e-03 -1.78839862e-02 8.90695080e-02
6.30150884e-02 1.11416787e-01 2.12221760e-02 -1.13236710e-01
5.39616570e-02 7.80710578e-02 -2.28817668e-02 1.92073174e-02
………………………….

在下例中，我们将使用 LSTM()，它对给定的输入序列应用长短期记忆 (LSTM) RNN。

In the example below we are going to use LSTM() which applies a long-short term memory (LSTM) RNN to a given input sequence.

layer = mx.gluon.rnn.LSTM(100, 3)
layer.initialize()

input_seq = mx.nd.random.uniform(shape=(5, 3, 10))
out_seq = layer(input_seq)
h0 = mx.nd.random.uniform(shape=(3, 3, 100))
c0 = mx.nd.random.uniform(shape=(3, 3, 100))
out_seq, hn = layer(input_seq,[h0,c0])
out_seq

Output

输出如下：

The output is mentioned below −

[[[ 9.00025964e-02 3.96071747e-02 1.83841765e-01 ... 3.95872220e-02
1.25569820e-01 2.15555862e-01]
[ 1.55962542e-01 -3.10300849e-02 1.76772922e-01 ... 1.92474753e-01
2.30574399e-01 2.81707942e-02]
[ 7.83204585e-02 6.53361529e-03 1.27262697e-01 ... 9.97719541e-02
1.28254429e-01 7.55299702e-02]]
[[ 4.41036932e-02 1.35250352e-02 9.87644792e-02 ... 5.89378644e-03
5.23949116e-02 1.00922674e-01]
[ 8.59075040e-02 -1.67027581e-02 9.69351009e-02 ... 1.17763653e-01
9.71239135e-02 2.25218050e-02]
[ 4.34580036e-02 7.62207608e-04 6.37005866e-02 ... 6.14888743e-02
5.96345589e-02 4.72368896e-02]]
……………

Example

hn

Output

运行代码后，你将看到以下输出 −

When you run the code, you will see the following output −

[
[[[ 2.21408084e-02 1.42750628e-02 9.53067932e-03 -1.22849066e-02
1.78788435e-02 5.99269159e-02 5.65306023e-02 6.42553642e-02
6.56616641e-03 9.80876666e-03 -1.15729487e-02 5.98640442e-02
-7.21173314e-03 -2.78371759e-02 -1.90690923e-02 2.21447181e-02
8.38765781e-03 -1.38521893e-02 -9.06938594e-03 1.21346042e-02
6.06449470e-02 -3.77471633e-02 5.65885007e-02 6.63008019e-02
-7.34188128e-03 6.46054149e-02 3.19911093e-02 4.11194898e-02
4.43960279e-02 4.92892228e-02 1.74766723e-02 3.40303481e-02
-5.23341820e-03 2.68163737e-02 -9.43402853e-03 -4.11836170e-02
1.55221792e-02 -5.05655073e-02 4.24557598e-03 -3.40388380e-02
……………………

Training Modules

Gluon 中的训练模块如下 −

The training modules in Gluon are as follows −

gluon.loss

在 mxnet.gluon.loss 模块中，Gluon 提供了预定义的损失函数。基本上，它具有用于训练神经网络的损失。这就是它称为训练模块的原因。

In mxnet.gluon.loss module, Gluon provides pre-defined loss function. Basically, it has the losses for training neural network. That is the reason it is called the training module.

Methods and their parameters

以下是 mxnet.gluon.loss 训练模块中涵盖的一些重要方法及其参数：

Following are some of the important methods and their parameters covered by mxnet.gluon.loss training module:

Methods and its Parameters

Definition

Loss(weight, batch_axis, **kwargs)

This acts as the base class for loss.

L2Loss([weight, batch_axis])

It calculates the mean squared error (MSE) between label and prediction(pred).

L1Loss([weight, batch_axis])

It calculates the mean absolute error (MAE) between label and pred.

SigmoidBinaryCrossEntropyLoss([…])

This method is used for the cross-entropy loss for binary classification.

SigmoidBCELoss

This method is used for the cross-entropy loss for binary classification.

SoftmaxCrossEntropyLoss([axis, …])

It computes the softmax cross-entropy loss (CEL).

SoftmaxCELoss

It also computes the softmax cross entropy loss.

KLDivLoss([from_logits, axis, weight, …])

It is used for the Kullback-Leibler divergence loss.

CTCLoss([layout, label_layout, weight])

It is used for connectionist Temporal Classification Loss (TCL).

HuberLoss([rho, weight, batch_axis])

It calculates smoothed L1 loss. The smoothed L1 loss will be equal to L1 loss if absolute error exceeds rho but is equal to L2 loss otherwise.

HingeLoss([margin, weight, batch_axis])

This method calculates the hinge loss function often used in SVMs:

SquaredHingeLoss([margin, weight, batch_axis])

This method calculates the soft-margin loss function used in SVMs:

LogisticLoss([weight, batch_axis, label_format])

This method calculates the logistic loss.

TripletLoss([margin, weight, batch_axis])

This method calculates triplet loss given three input tensors and a positive margin.

PoissonNLLLoss([weight, from_logits, …])

The function calculates the Negative Log likelihood loss.

CosineEmbeddingLoss([weight, batch_axis, margin])

The function computes the cosine distance between the vectors.

SDMLLoss([smoothing_parameter, weight, …])

This method calculates Batchwise Smoothed Deep Metric Learning (SDML) Loss given two input tensors and a smoothing weight SDM Loss. It learns similarity between paired samples by using unpaired samples in the minibatch as potential negative examples.

Example

我们知道 mxnet.gluon.loss.loss 将计算标签与预测 (pred) 之间的 MSE（均方误差）。通过使用以下公式执行：

As we know that mxnet.gluon.loss.loss will calculate the MSE(Mean Squared Error) between label and prediction (pred). It is done with the help of following formula:

gluon.parameter

mxnet.gluon.parameter 是一个包含参数的容器，即块的权重。

mxnet.gluon.parameter is a container that holds the parameters i.e. weights of the Blocks.

Methods and their parameters

mxnet.gluon.parameter 训练模块涵盖的一些重要方法及其参数如下 −

Following are some of the important methods and their parameters covered by mxnet.gluon.parameter training module −

Methods and its Parameters

Definition

cast(dtype)

This method will cast data and gradient of this Parameter to a new data type.

data([ctx])

This method will return a copy of this parameter on one context.

grad([ctx])

This method will return a gradient buffer for this parameter on one context.

initialize([init, ctx, default_init, …])

This method will initialize parameter and gradient arrays.

list_ctx()

This method will return a list of contexts this parameter is initialized on.

list_data()

This method will return copies of this parameter on all contexts. It will be done in the same order as creation.

list_grad()

This method will return gradient buffers on all contexts. This will be done in the same order as values().

list_row_sparse_data(row_id)

This method will return copies of the ‘row_sparse’ parameter on all contexts. This will be done in the same order as creation.

reset_ctx(ctx)

This method will re-assign Parameter to other contexts.

row_sparse_data(row_id)

This method will return a copy of the ‘row_sparse’ parameter on the same context as row_id’s.

set_data(data)

This method will set this parameter’s value on all contexts.

var()

This method will return a symbol representing this parameter.

zero_grad()

This method will set the gradient buffer on all contexts to 0.

Implementation Example

在以下示例中，我们将使用 initialize() 方法初始化参数和梯度数组，如下所示：-

In the example below, we will initialize parameters and the gradients arrays by using initialize() method as follows −

weight = mx.gluon.Parameter('weight', shape=(2, 2))
weight.initialize(ctx=mx.cpu(0))
weight.data()

Output

输出如下：

The output is mentioned below −

[[-0.0256899 0.06511251]
[-0.00243821 -0.00123186]]
<NDArray 2x2 @cpu(0)>

Example

weight.grad()

Output

输出如下 −

The output is given below −

[[0. 0.]
[0. 0.]]
<NDArray 2x2 @cpu(0)>

Example

weight.initialize(ctx=[mx.gpu(0), mx.gpu(1)])
weight.data(mx.gpu(0))

Output

您将看到以下输出 −

You will see the following output −

[[-0.00873779 -0.02834515]
 [ 0.05484822 -0.06206018]]
<NDArray 2x2 @gpu(0)>

Example

weight.data(mx.gpu(1))

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

[[-0.00873779 -0.02834515]
 [ 0.05484822 -0.06206018]]
<NDArray 2x2 @gpu(1)>

gluon.trainer

mxnet.gluon.trainer 对一组参数应用了一个优化器。它应该和 autograd 一起使用。

mxnet.gluon.trainer applies an Optimizer on a set of parameters. It should be used together with autograd.

Methods and their parameters

以下是 mxnet.gluon.trainer 训练模块涵盖的一些重要的方法及参数：-

Following are some of the important methods and their parameters covered by mxnet.gluon.trainer training module −

Methods and its Parameters

Definition

allreduce_grads()

This method will reduce the gradients from different contexts for each parameter (weight).

load_states(fname)

As name implies, this method will load trainer states.

save_states(fname)

As name implies, this method will save trainer states.

set_learning_rate(lr)

This method will set a new learning rate of the optimizer.

step(batch_size[, ignore_stale_grad])

This method will make one step of parameter update. It should be called after autograd.backward() and outside of record() scope.

update(batch_size[, ignore_stale_grad])

This method will also make one step of parameter update. It should be called after autograd.backward() and outside of record() scope and after trainer.update().

Data Modules

Gluon 的数据模块在下面有说明：-

The data modules of Gluon are explained below −

gluon.data

Gluon 在 gluon.data 模块中提供大量内置的数据集实用程序。这就是它被称为数据模块的原因。

Gluon provides a large number of build-in dataset utilities in gluon.data module. That is the reason it is called the data module.

Classes and their parameters

以下是 mxnet.gluon.data 核心模块涵盖的一些重要的方法及参数。这些方法通常与数据集、采样和 DataLoader 相关。

Following are some of the important methods and their parameters covered by mxnet.gluon.data core module. These methods are typically related to Datasets, Sampling, and DataLoader.

Methods and its Parameters

Definition

ArrayDataset(*args)

This method represents a dataset which combines two or more than two dataset-like objects. For example, Datasets, lists, arrays, etc.

BatchSampler(sampler, batch_size[, last_batch])

This method wraps over another Sampler. Once wrapped it returns the mini batches of samples.

DataLoader(dataset[, batch_size, shuffle, …])

Similar to BatchSampler but this method loads data from a dataset. Once loaded it returns the mini batches of data.

This represents the abstract dataset class.

FilterSampler(fn, dataset)

This method represents the samples elements from a Dataset for which fn (function) returns True.

RandomSampler(length)

This method represents samples elements from [0, length) randomly without replacement.

RecordFileDataset(filename)

It represents a dataset wrapping over a RecordIO file. The extension of the file is .rec.

Sampler

This is the base class for samplers.

SequentialSampler(length[, start])

It represents the sample elements from the set [start, start+length) sequentially.

Implementation Examples

在以下示例中，我们将使用 gluon.data.BatchSampler() API，它包装另一个采样器。它返回批量迷你采样。

In the example below, we are going to use gluon.data.BatchSampler() API, which wraps over another sampler. It returns the mini batches of samples.

import mxnet as mx
from mxnet.gluon import data
sampler = mx.gluon.data.SequentialSampler(15)
batch_sampler = mx.gluon.data.BatchSampler(sampler, 4, 'keep')
list(batch_sampler)

Output

输出如下：

The output is mentioned below −

[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14]]

gluon.data.vision.datasets

Gluon 在 gluon.data.vision.datasets 模块中提供了大量的预定义视觉数据集函数。

Gluon provides a large number of pre-defined vision dataset functions in gluon.data.vision.datasets module.

Classes and their parameters

MXNet 为我们提供了有用且重要的数据集，其类和参数如下所示——

MXNet provides us useful and important datasets, whose classes and parameters are given below −

Classes and its Parameters

Definition

MNIST([root, train, transform])

This is a useful dataset providing us the handwritten digits. The url for MNIST dataset is [role="bare"]http://yann.lecun.com/exdb/mnist

FashionMNIST([root, train, transform])

This dataset consists of Zalando’s article images consisting of fashion products. It is a drop-in replacement of original MNIST dataset. You can get this dataset from [role="bare"]https://github.com/zalandoresearch/fashion-mnist

CIFAR10([root, train, transform])

This is an image classification dataset from [role="bare"]https://www.cs.toronto.edu/~kriz/cifar.html. In this dataset each sample is an image with shape (32, 32, 3).

CIFAR100([root, fine_label, train, transform])

This is CIFAR100 image classification dataset from [role="bare"]https://www.cs.toronto.edu/~kriz/cifar.html. It also has each sample is an image with shape (32, 32, 3).

ImageRecordDataset (filename[, flag, transform])

This dataset is wrapping over a RecordIO file that contains images. In this each sample is an image with its corresponding label.

ImageFolderDataset (root[, flag, transform])

This is a dataset for loading image files that are stored in a folder structure.

ImageListDataset ([root, imglist, flag])

This is a dataset for loading image files that are specified by a list of entries.

Example

在以下示例中，我们将展示 ImageListDataset() 的用法，它用于加载由条目列表指定图像文件——

In the example below, we are going to show the use of ImageListDataset(), which is used for loading image files that are specified by a list of entries −

# written to text file *.lst

0 0 root/cat/0001.jpg
1 0 root/cat/xxxa.jpg
2 0 root/cat/yyyb.jpg
3 1 root/dog/123.jpg
4 1 root/dog/023.jpg
5 1 root/dog/wwww.jpg

# A pure list, each item is a list [imagelabel: float or list of float, imgpath]

[[0, root/cat/0001.jpg]
[0, root/cat/xxxa.jpg]
[0, root/cat/yyyb.jpg]
[1, root/dog/123.jpg]
[1, root/dog/023.jpg]
[1, root/dog/wwww.jpg]]

Utility Modules

Gluon 中的实用程序模块如下——

The utility modules in Gluon are as follows −

gluon.utils

Gluon 在 gluon.utils 模块中提供了大量的内置并行化实用程序优化器。它提供了用于训练的各种实用程序。这就是它被称为实用程序模块的原因。

Gluon provides a large number of build-in parallelisation utility optimiser in gluon.utils module. It provides variety of utilities for training. That is the reason it is called the utility module.

Functions and their parameters

以下是该实用程序模块中包含的功能及其参数，该模块名为 gluon.utils −

Following are the functions and their parameters consisting in this utility module named gluon.utils −

Functions and its Parameters

Definition

split_data(data, num_slice[, batch_axis, …])

This function is usually use for data parallelism and each slice is sent to one device i.e. GPU. It splits an NDArray into num_slice *slices along *batch_axis.

split_and_load(data, ctx_list[, batch_axis, …])

This function splits an NDArray into len(ctx_list) slices along batch_axis. The only difference from above split_data () function is that, it also loads each slice to one context in ctx_list.

clip_global_norm(arrays, max_norm[, …])

The job of this function is to rescale NDArrays in such a way that the sum of their 2-norm is smaller than max_norm.

check_sha1(filename, sha1_hash)

This function will check whether the sha1 hash of the file content matches the expected hash or not.

download(url[, path, overwrite, sha1_hash, …])

As name specifies, this function will download a given URL.

replace_file(src, dst)

This function will implement atomic os.replace. it will be done with Linux and OSX.

Python API Autograd and Initializer

本章介绍了 MXNet 中的自动微分和初始化器 API。

This chapter deals with the autograd and initializer API in MXNet.

mxnet.autograd

这是 MXNet 对 NDArray 的自动微分 API。它具有以下类 -

This is MXNet’ autograd API for NDArray. It has the following class −

Class: Function()

它用于自动微分中的自定义微分。它可以写为 [s2]。如果由于任何原因，用户不希望使用默认链式法则计算的梯度，那么他/她可以使用 mxnet.autograd 的 Function 类自定义微分的计算。它有两个方法，即 Forward() 和 Backward()。

It is used for customised differentiation in autograd. It can be written as mxnet.autograd.Function. If, for any reason, the user do not want to use the gradients that are computed by the default chain-rule, then he/she can use Function class of mxnet.autograd to customize differentiation for computation. It has two methods namely Forward() and Backward().

让我们借助以下要点来了解此类的作用 -

Let us understand the working of this class with the help of following points −

First, we need to define our computation in the forward method.
Then, we need to provide the customized differentiation in the backward method.
Now during gradient computation, instead of user-defined backward function, mxnet.autograd will use the backward function defined by the user. We can also cast to numpy array and back for some operations in forward as well as backward.

Example

在使用 mxnet.autograd.function 类之前，让我们定义一个稳定的 sigmoid 函数及其反向和正向方法，如下所示 -

Before using the mxnet.autograd.function class, let’s define a stable sigmoid function with backward as well as forward methods as follows −

class sigmoid(mx.autograd.Function):
   def forward(self, x):
      y = 1 / (1 + mx.nd.exp(-x))
      self.save_for_backward(y)
      return y

   def backward(self, dy):
      y, = self.saved_tensors
      return dy * y * (1-y)

现在，function 类可以用作以下 -

Now, the function class can be used as follows −

func = sigmoid()
x = mx.nd.random.uniform(shape=(10,))
x.attach_grad()
with mx.autograd.record():
m = func(x)
m.backward()
dx_grad = x.grad.asnumpy()
dx_grad

Output

运行代码后，你将看到以下输出 −

When you run the code, you will see the following output −

array([0.21458015, 0.21291625, 0.23330082, 0.2361367 , 0.23086983,
0.24060014, 0.20326573, 0.21093895, 0.24968489, 0.24301809],
dtype=float32)

Methods and their parameters

mxnet.autogard.function 类的以下方法和参数 -

Following are the methods and their parameters of mxnet.autogard.function class −

Methods and its Parameters

Definition

forward (heads[, head_grads, retain_graph, …])

This method is used for forward computation.

backward(heads[, head_grads, retain_graph, …])

This method is used for backward computation. It computes the gradients of heads with respect to previously marked variables. This method takes as many inputs as forward’s output. It also returns as many NDArray’s as forward’s inputs.

get_symbol(x)

This method is used to retrieve recorded computation history as Symbol.

grad(heads, variables[, head_grads, …])

This method computes the gradients of heads with respect to variables. Once computed, instead of storing into variable.grad, gradients will be returned as new NDArrays.

is_recording()

With the help of this method we can get status on recording and not recording.

is_training()

With the help of this method we can get status on training and predicting.

mark_variables(variables, gradients[, grad_reqs])

This method will mark NDArrays as variables to compute gradient for autograd. This method is same as function .attach_grad() in a variable but the only difference is that with this call we can set the gradient to any value.

pause([train_mode])

This method returns a scope context to be used in ‘with’ statement for codes which do not need gradients to be calculated.

predict_mode()

This method returns a scope context to be used in ‘with’ statement in which forward pass behavior is set to inference mode and that is without changing the recording states.

record([train_mode])

It will return an autograd recording scope context to be used in ‘with’ statement and captures code which needs gradients to be calculated.

set_recording(is_recording)

Similar to is_recoring(), with the help of this method we can get status on recording and not recording.

set_training(is_training)

Similar to is_traininig(), with the help of this method we can set status to training or predicting.

train_mode()

This method will return a scope context to be used in ‘with’ statement in which forward pass behavior is set to training mode and that is without changing the recording states.

Implementation Example

在以下示例中，我们将使用 mxnet.autograd.grad() 方法来计算目标相对于变量的梯度 −

In the below example, we will be using mxnet.autograd.grad() method to compute the gradient of head with respect to variables −

x = mx.nd.ones((2,))
x.attach_grad()
with mx.autograd.record():
z = mx.nd.elemwise_add(mx.nd.exp(x), x)
dx_grad = mx.autograd.grad(z, [x], create_graph=True)
dx_grad

Output

输出如下：

The output is mentioned below −

[
[3.7182817 3.7182817]
<NDArray 2 @cpu(0)>]

我们可以使用 mxnet.autograd.predict_mode() 方法来返回一个范围用于“with”语句 −

We can use mxnet.autograd.predict_mode() method to return a scope to be used in ‘with’ statement −

with mx.autograd.record():
y = model(x)
with mx.autograd.predict_mode():
y = sampling(y)
backward([y])

mxnet.intializer

这是 MXNet 的 API 用于权重初始化器。它具有以下类 −

This is MXNet’ API for weigh initializer. It has the following classes −

Classes and their parameters

以下为 mxnet.autogard.function 类的方法和其参数：

Following are the methods and their parameters of mxnet.autogard.function class:

Classes and its Parameters

Definition

Bilinear()

With the help of this class we can initialize weight for up-sampling layers.

Constant(value)

This class initializes the weights to a given value. The value can be a scalar as well as NDArray that matches the shape of the parameter to be set.

FusedRNN(init, num_hidden, num_layers, mode)

As name implies, this class initialize parameters for the fused Recurrent Neural Network (RNN) layers.

InitDesc

It acts as the descriptor for the initialization pattern.

Initializer(**kwargs)

This is the base class of an initializer.

LSTMBias([forget_bias])

This class initialize all biases of an LSTMCell to 0.0 but except for the forget gate whose bias is set to a custom value.

Load(param[, default_init, verbose])

This class initialize the variables by loading data from file or dictionary.

MSRAPrelu([factor_type, slope])

As name implies, this class Initialize the weight according to a MSRA paper.

Mixed(patterns, initializers)

It initializes the parameters using multiple initializers.

Normal([sigma])

Normal() class initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation (SD) of sigma.

One()

It initializes the weights of parameter to one.

Orthogonal([scale, rand_type])

As name implies, this class initialize weight as orthogonal matrix.

Uniform([scale])

It initializes weights with random values which is uniformly sampled from a given range.

Xavier([rnd_type, factor_type, magnitude])

It actually returns an initializer that performs “Xavier” initialization for weights.

Zero()

It initializes the weights of parameter to zero.

Implementation Example

在以下示例中，我们将使用 mxnet.init.Normal() 类创建初始化器并获取其参数 −

In the below example, we will be using mxnet.init.Normal() class create an initializer and retrieve its parameters −

init = mx.init.Normal(0.8)
init.dumps()

Output

输出如下 −

The output is given below −

'["normal", {"sigma": 0.8}]'

Example

init = mx.init.Xavier(factor_type="in", magnitude=2.45)
init.dumps()

Output

输出如下所示−

The output is shown below −

'["xavier", {"rnd_type": "uniform", "factor_type": "in", "magnitude": 2.45}]'

在以下示例中，我们将使用 mxnet.initializer.Mixed() 类使用多个初始化器来初始化参数 −

In the below example, we will be using mxnet.initializer.Mixed() class to initialize parameters using multiple initializers −

init = mx.initializer.Mixed(['bias', '.*'], [mx.init.Zero(),
mx.init.Uniform(0.1)])
module.init_params(init)

for dictionary in module.get_params():
for key in dictionary:
print(key)
print(dictionary[key].asnumpy())

Output

输出如下所示−

The output is shown below −

fullyconnected1_weight
[[ 0.0097627 0.01856892 0.04303787]]
fullyconnected1_bias
[ 0.]

Apache MXNet - Python API Symbol

在本章中，我们将了解 MXNet 中的一个接口，该接口被称为 Symbol。

In this chapter, we will learn about an interface in MXNet which is termed as Symbol.

Mxnet.ndarray

Apache MXNet 的 Symbol API 是用于符号编程的接口。Symbol API 的特点是使用以下功能 −

Apache MXNet’s Symbol API is an interface for symbolic programming. Symbol API features the use of the following −

Computational graphs
Reduced memory usage
Pre-use function optimization

以下给出的示例演示了如何使用 MXNet 的 Symbol API 创建一个简单的表达式 −

The example given below shows how one can create a simple expression by using MXNet’s Symbol API −

通过普通 Python 列表使用 1-D 和 2-D“数组”的一组 NDArray −

An NDArray by using 1-D and 2-D ‘array’ from a regular Python list −

import mxnet as mx
# Two placeholders namely x and y will be created with mx.sym.variable
x = mx.sym.Variable('x')
y = mx.sym.Variable('y')
# The symbol here is constructed using the plus ‘+’ operator.
z = x + y

Output

您将看到以下输出 −

You will see the following output −

<Symbol _plus0>

Example

(x, y, z)

Output

输出如下 −

The output is given below −

(<Symbol x>, <Symbol y>, <Symbol _plus0>)

现在，让我们详细讨论 MXNet 的 ndarray API 的类、函数和参数。

Now let us discuss in detail about the classes, functions, and parameters of ndarray API of MXNet.

Classes

下表包含了 MXNet 的 Symbol API 的类 −

Following table consists of the classes of Symbol API of MXNet −

Class

Definition

Symbol(handle)

This class namely symbol is the symbolic graph of the Apache MXNet.

Functions and their parameters

以下是一些 mxnet.Symbol API 涵盖的重要函数及其参数 −

Following are some of the important functions and their parameters covered by mxnet.Symbol API −

Function and its Parameters

Definition

Activation([data, act_type, out, name])

It applies an activation function element-wise to the input. It supports relu, sigmoid, tanh, softrelu, softsign activation functions.

BatchNorm([data, gamma, beta, moving_mean, …])

It is used for batch normalization. This function normalizes a data batch by mean and variance. It applies a scale gamma *and offset *beta.

BilinearSampler([data, grid, cudnn_off, …])

BlockGrad([data, out, name])

As name specifies, this function stops gradient computation. It basically stops the accumulated gradient of the inputs from flowing through this operator in backward direction.

cast([data, dtype, out, name])

This function will cast all elements of the input to a new type.

This function, as name specified, returns a new symbol of given shape and type, filled with zeros.

ones(shape[, dtype])

This function, as name specified return a new symbol of given shape and type, filled with ones.

full(shape, val[, dtype])

This function, as name specified returns a new array of given shape and type, filled with the given value val.

arange(start[, stop, step, repeat, …])

It will return evenly spaced values within a given interval. The values are generated within half open interval [start, stop) which means that the interval includes start but excludes stop.

linspace(start, stop, num[, endpoint, name, …])

It will return evenly spaced numbers within a specified interval. Similar to the function arrange(), the values are generated within half open interval [start, stop) which means that the interval includes start but excludes stop.

histogram(a[, bins, range])

As name implies, this function will compute the histogram of the input data.

power(base, exp)

As name implies, this function will return element-wise result of base element raised to powers from exp element. Both inputs i.e. base and exp, can be either Symbol or scalar. Here note that broadcasting is not allowed. You can use broadcast_pow if you want to use the feature of broadcast.

SoftmaxActivation([data, mode, name, attr, out])

This function applies softmax activation to input. It is intended for internal layers. It is actually deprecated, we can use softmax() instead.

Implementation Examples

在下面的示例中，我们将使用函数 power() ，它将返回 exp 元素中 base 元素求幂的逐元素结果：

In the example below we will be using the function power() which will return element-wise result of base element raised to the powers from exp element:

import mxnet as mx
mx.sym.power(3, 5)

Output

您将看到以下输出 −

You will see the following output −

Example

x = mx.sym.Variable('x')
y = mx.sym.Variable('y')
z = mx.sym.power(x, 3)
z.eval(x=mx.nd.array([1,2]))[0].asnumpy()

Output

生成以下输出：

This produces the following output −

array([1., 8.], dtype=float32)

Example

z = mx.sym.power(4, y)
z.eval(y=mx.nd.array([2,3]))[0].asnumpy()

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

array([16., 64.], dtype=float32)

Example

z = mx.sym.power(x, y)
z.eval(x=mx.nd.array([4,5]), y=mx.nd.array([2,3]))[0].asnumpy()

Output

输出如下：

The output is mentioned below −

array([ 16., 125.], dtype=float32)

在下面给出的示例中，我们将使用函数 SoftmaxActivation() (or softmax()) ，它将应用于输入，并适用于内部层。

In the example given below, we will be using the function SoftmaxActivation() (or softmax()) which will be applied to input and is intended for internal layers.

input_data = mx.nd.array([[2., 0.9, -0.5, 4., 8.], [4., -.7, 9., 2., 0.9]])
soft_max_act = mx.nd.softmax(input_data)
print (soft_max_act.asnumpy())

Output

您将看到以下输出 −

You will see the following output −

[[2.4258138e-03 8.0748333e-04 1.9912292e-04 1.7924475e-02 9.7864312e-01]
[6.6843745e-03 6.0796250e-05 9.9204916e-01 9.0463174e-04 3.0112563e-04]]

symbol.contrib

Contrib NDArray API 在 symbol.contrib 包中定义。它通常为新特性提供许多有用的实验性 API。此 API 作为社区的一个地方，社区可以在其中试用新特性。特性贡献者也将获得反馈。

The Contrib NDArray API is defined in the symbol.contrib package. It typically provides many useful experimental APIs for new features. This API works as a place for the community where they can try out the new features. The feature contributor will get the feedback as well.

Functions and their parameters

以下是一些 mxnet.symbol.contrib API 涵盖的重要函数及其参数：

Following are some of the important functions and their parameters covered by mxnet.symbol.contrib API −

Function and its Parameters

Definition

rand_zipfian(true_classes, num_sampled, …)

foreach(body, data, init_states)

As name implies, this function runs a loop with user-defined computation over NDArrays on dimension 0. This function simulates a for loop and body has the computation for an iteration of the for loop.

while_loop(cond, func, loop_vars[, …])

cond(pred, then_func, else_func)

As name implies, this function run an if-then-else using user-defined condition and computation. This function simulates an if-like branch which chooses to do one of the two customized computations according to the specified condition.

getnnz([data, axis, out, name])

This function gives us the number of stored values for a sparse tensor. It also includes explicit zeros. It only supports CSR matrix on CPU.

requantize([data, min_range, max_range, …])

This function requantize the given data that is quantized in int32 and the corresponding thresholds, into int8 using min and max thresholds either calculated at runtime or from calibration.

index_copy([old_tensor, index_vector, …])

This function copies the elements of a new_tensor into the old_tensor by selecting the indices in the order given in index. The output of this operator will be a new tensor that contains the rest elements of old tensor and the copied elements of new tensor.

interleaved_matmul_encdec_qk([queries, …])

This operator compute the matrix multiplication between the projections of queries and keys in multi-head attention use as encoder-decoder. The condition is that the inputs should be a tensor of projections of queries that follows the layout: (seq_length, batch_size, num_heads*, head_dim).

Implementation Examples

在下面的示例中，我们将使用 rand_zipfian 函数从近似齐夫分布中抽取随机样本−

In the example below we will be using the function rand_zipfian for drawing random samples from an approximately Zipfian distribution −

import mxnet as mx
true_cls = mx.sym.Variable('true_cls')
samples, exp_count_true, exp_count_sample = mx.sym.contrib.rand_zipfian(true_cls, 5, 6)
samples.eval(true_cls=mx.nd.array([3]))[0].asnumpy()

Output

您将看到以下输出 −

You will see the following output −

array([4, 0, 2, 1, 5], dtype=int64)

Example

exp_count_true.eval(true_cls=mx.nd.array([3]))[0].asnumpy()

Output

输出如下：

The output is mentioned below −

array([0.57336551])

Example

exp_count_sample.eval(true_cls=mx.nd.array([3]))[0].asnumpy()

Output

您将看到以下输出 −

You will see the following output −

array([1.78103594, 0.46847373, 1.04183923, 0.57336551, 1.04183923])

在下面的示例中，我们将使用 while_loop 函数运行 while 循环以进行用户定义的计算和循环条件−

In the example below we will be using the function while_loop for running a while loop for user-defined computation and loop condition −

cond = lambda i, s: i <= 7
func = lambda i, s: ([i + s], [i + 1, s + i])
loop_vars = (mx.sym.var('i'), mx.sym.var('s'))
outputs, states = mx.sym.contrib.while_loop(cond, func, loop_vars, max_iterations=10)
print(outputs)

Output

输出如下：

The output is given below:

[<Symbol _while_loop0>]

Example

Print(States)

Output

生成以下输出：

This produces the following output −

[<Symbol _while_loop0>, <Symbol _while_loop0>]

在下面的示例中，我们将使用将 new_tensor 中的元素复制到 old_tensor 中的函数 index_copy 。

In the example below we will be using the function index_copy that copies the elements of new_tensor into the old_tensor.

import mxnet as mx
a = mx.nd.zeros((6,3))
b = mx.nd.array([[1,2,3],[4,5,6],[7,8,9]])
index = mx.nd.array([0,4,2])
mx.nd.contrib.index_copy(a, index, b)

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

[[1. 2. 3.]
[0. 0. 0.]
[7. 8. 9.]
[0. 0. 0.]
[4. 5. 6.]
[0. 0. 0.]]
<NDArray 6x3 @cpu(0)>

symbol.image

图像符号 API 在 symbol.image 包中定义。正如名称所示，它通常用于图像及其功能。

The Image Symbol API is defined in the symbol.image package. As name implies, it typically used for images and their features.

Functions and their parameters

以下是一些 mxnet.symbol.image API 涵盖的重要函数及其参数−

Following are some of the important functions and their parameters covered by mxnet.symbol.image API −

Function and its Parameters

Definition

adjust_lighting([data, alpha, out, name])

As name implies, this function adjusts the lighting level of the input. It follows the AlexNet style.

crop([data, x, y, width, height, out, name])

With the help of this function we can crop an image NDArray of shape (H x W x C) or (N x H x W x C) to the size given by user.

normalize([data, mean, std, out, name])

It will normalize an tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation(SD).

random_crop([data, xrange, yrange, width, …])

Similar to crop(), it randomly crop an image NDArray of shape (H x W x C) or (N x H x W x C) to the size given by the user. It will upsample the result if src is smaller than the size.

random_lighting([data, alpha_std, out, name])

As name implies, this function adds the PCA noise randomly. It also follows the AlexNet style.

random_resized_crop([data, xrange, yrange, …])

It also crops an image randomly NDArray of shape (H x W x C) or (N x H x W x C) to the given size. It will upsample the result if src is smaller than the size. It will randomize the area and aspect ration as well.

resize([data, size, keep_ratio, interp, …])

As name implies, this function will resize an image NDArray of shape (H x W x C) or (N x H x W x C) to the size given by user.

to_tensor([data, out, name])

Implementation Examples

import numpy as np

img = mx.sym.random.uniform(0, 255, (4, 2, 3)).astype(dtype=np.uint8)

mx.sym.image.to_tensor(img)

Output

输出如下 −

The output is stated below −

<Symbol to_tensor4>

Example

img = mx.sym.random.uniform(0, 255, (2, 4, 2, 3)).astype(dtype=np.uint8)

mx.sym.image.to_tensor(img)

Output

输出如下所示：

The output is mentioned below:

<Symbol to_tensor5>

在下面的示例中，我们将使用 normalize() 函数对形状为 (C x H x W) 或 (N x C x H x W) 的张量使用 mean 和 standard deviation(SD) 进行归一化。

In the example below, we will be using the function normalize() to normalize an tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation(SD).

img = mx.sym.random.uniform(0, 1, (3, 4, 2))

mx.sym.image.normalize(img, mean=(0, 1, 2), std=(3, 2, 1))

Output

以下是代码的输出 −

Given below is the output of the code −

<Symbol normalize0>

Example

img = mx.sym.random.uniform(0, 1, (2, 3, 4, 2))

mx.sym.image.normalize(img, mean=(0, 1, 2), std=(3, 2, 1))

Output

输出如下所示−

The output is shown below −

<Symbol normalize1>

symbol.random

随机符号 API 在 symbol.random 包中定义。正如名称所示，它是 MXNet 的随机分配发生器 Symbol API。

The Random Symbol API is defined in the symbol.random package. As name implies, it is random distribution generator Symbol API of MXNet.

Functions and their parameters

以下是一些 mxnet.symbol.random API 涵盖的重要函数及其参数−

Following are some of the important functions and their parameters covered by mxnet.symbol.random API −

Function and its Parameters

Definition

uniform([low, high, shape, dtype, ctx, out])

It generates random samples from a uniform distribution.

normal([loc, scale, shape, dtype, ctx, out])

It generates random samples from a normal (Gaussian) distribution.

randn(*shape, **kwargs)

It generates random samples from a normal (Gaussian) distribution.

poisson([lam, shape, dtype, ctx, out])

It generates random samples from a Poisson distribution.

exponential([scale, shape, dtype, ctx, out])

It generates samples from an exponential distribution.

gamma([alpha, beta, shape, dtype, ctx, out])

It generates random samples from a gamma distribution.

multinomial(data[, shape, get_prob, out, dtype])

It generates concurrent sampling from multiple multinomial distributions.

negative_binomial([k, p, shape, dtype, ctx, out])

It generates random samples from a negative binomial distribution.

generalized_negative_binomial([mu, alpha, …])

It generates random samples from a generalized negative binomial distribution.

shuffle(data, **kwargs)

It shuffles the elements randomly.

randint(low, high[, shape, dtype, ctx, out])

It generates random samples from a discrete uniform distribution.

exponential_like([data, lam, out, name])

It generates random samples from an exponential distribution according to the input array shape.

gamma_like([data, alpha, beta, out, name])

It generates random samples from a gamma distribution according to the input array shape.

generalized_negative_binomial_like([data, …])

It generates random samples from a generalized negative binomial distribution according to the input array shape.

negative_binomial_like([data, k, p, out, name])

It generates random samples from a negative binomial distribution according to the input array shape.

normal_like([data, loc, scale, out, name])

It generates random samples from a normal (Gaussian) distribution according to the input array shape.

poisson_like([data, lam, out, name])

It generates random samples from a Poisson distribution according to the input array shape.

uniform_like([data, low, high, out, name])

It generates random samples from a uniform distribution according to the input array shape.

Implementation Examples

在下面的示例中，我们将使用 shuffle() 函数随机地随机排列元素。它将沿着第一个轴随机排列数组。

In the example below, we are going to shuffle the elements randomly using shuffle() function. It will shuffle the array along the first axis.

data = mx.nd.array([[0, 1, 2], [3, 4, 5], [6, 7, 8],[9,10,11]])
x = mx.sym.Variable('x')
y = mx.sym.random.shuffle(x)
y.eval(x=data)

Output

您将看到以下输出：

You will see the following output:

[
[[ 9. 10. 11.]
[ 0. 1. 2.]
[ 6. 7. 8.]
[ 3. 4. 5.]]
<NDArray 4x3 @cpu(0)>]

Example

y.eval(x=data)

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

[
[[ 6. 7. 8.]
[ 0. 1. 2.]
[ 3. 4. 5.]
[ 9. 10. 11.]]
<NDArray 4x3 @cpu(0)>]

在下面的示例中，我们将从广义负二项分布中提取随机样本。为此，将使用函数 generalized_negative_binomial() 。

In the example below, we are going to draw random samples from a generalized negative binomial distribution. For this will be using the function generalized_negative_binomial().

mx.sym.random.generalized_negative_binomial(10, 0.1)

Output

输出如下 −

The output is given below −

<Symbol _random_generalized_negative_binomial0>

symbol.sparse

稀疏符号 API 在 mxnet.symbol.sparse 程序包中定义。顾名思义，它提供了稀疏神经网络图和 CPU 上的自动微分。

The Sparse Symbol API is defined in the mxnet.symbol.sparse package. As name implies, it provides sparse neural network graphs and auto-differentiation on CPU.

Functions and their parameters

以下一些重要的函数（包括符号创建例程、符号操作例程、数学函数、三角函数、双曲函数、减少函数、舍入、幂、神经网络）及其参数由 mxnet.symbol.sparse API 涵盖：

Following are some of the important functions (includes Symbol creation routines, Symbol Manipulation routines, Mathematical functions, Trigonometric function, Hyberbolic functions, Reduce functions, Rounding, Powers, Neural Network) and their parameters covered by mxnet.symbol.sparse API −

Function and its Parameters

Definition

ElementWiseSum(*args, **kwargs)

This function will add all input arguments element wise. For example, 𝑎𝑑𝑑_𝑛(𝑎1,𝑎2,…𝑎𝑛=𝑎1+𝑎2+⋯+𝑎𝑛). Here, we can see that add_n is potentially more efficient than calling add by n times.

Embedding([data, weight, input_dim, …])

It will map the integer indices to vector representations i.e. embeddings. It actually maps words to real-valued vectors in high-dimensional space which is called word embeddings.

LinearRegressionOutput([data, label, …])

It computes and optimizes for squared loss during backward propagation giving just output data during forward propagation.

LogisticRegressionOutput([data, label, …])

Applies a logistic function which is also called the sigmoid function to the input. The function is computed as 1/1+exp (−x).

MAERegressionOutput([data, label, …])

This operator computes mean absolute error of the input. MAE is actually a risk metric corresponding to the expected value of absolute error.

abs([data, name, attr, out])

As name implies, this function will return element-wise absolute value of the input.

adagrad_update([weight, grad, history, lr, …])

It is an update function for AdaGrad optimizer.

adam_update([weight, grad, mean, var, lr, …])

It is an update function for Adam optimizer.

add_n(*args, **kwargs)

As name implies it will adds all input arguments element-wise.

arccos([data, name, attr, out])

This function will returns element-wise inverse cosine of the input array.

dot([lhs, rhs, transpose_a, transpose_b, …])

As name implies, it will give the dot product of two arrays. It will depend upon the input array dimension: 1-D: inner product of vectors 2-D: matrix multiplication N-D: A sum product over the last axis of the first input and the first axis of the second input.

elemwise_add([lhs, rhs, name, attr, out])

As name implies it will add arguments element wise.

elemwise_div([lhs, rhs, name, attr, out])

As name implies it will divide arguments element wise.

elemwise_mul([lhs, rhs, name, attr, out])

As name implies it will Multiply arguments element wise.

elemwise_sub([lhs, rhs, name, attr, out])

As name implies it will Subtract arguments element wise.

exp([data, name, attr, out])

This function will return element wise exponential value of the given input.

sgd_update([weight, grad, lr, wd, …])

It acts as an update function for Stochastic Gradient Descent optimizer.

sigmoid([data, name, attr, out])

As name implies it will compute sigmoid of x element wise.

sign([data, name, attr, out])

It will return the element wise sign of the given input.

sin([data, name, attr, out])

As name implies, this function will computes the element wise sine of the given input array.

Implementation Example

在下面的示例中，我们将使用 ElementWiseSum() 函数随机对元素进行洗牌。它将把整数索引映射到向量表示中，即单词嵌入。

In the example below, we are going to shuffle the elements randomly using ElementWiseSum() function. It will map integer indices to vector representations i.e. word embeddings.

input_dim = 4
output_dim = 5

Example

/* Here every row in weight matrix y represents a word. So, y = (w0,w1,w2,w3)
y = [[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.]]
/* Here input array x represents n-grams(2-gram). So, x = [(w1,w3), (w0,w2)]
x = [[ 1., 3.],
[ 0., 2.]]
/* Now, Mapped input x to its vector representation y.
Embedding(x, y, 4, 5) = [[[ 5., 6., 7., 8., 9.],
[ 15., 16., 17., 18., 19.]],
[[ 0., 1., 2., 3., 4.],
[ 10., 11., 12., 13., 14.]]]

Apache MXNet - Python API Module

Apache MXNet 的模块 API 类似于前馈模型，而且可以很容易地组合类似于 Torch 模块。它由以下的类组成：

Apache MXNet’s module API is like a FeedForward model and it is easier to compose similar to Torch module. It consists of following classes −

BaseModule([logger])

它表示模块的基础类。可以将模块视为计算组件或计算机器。模块的任务是执行前向传播和反向传播。它还更新模型中的参数。

It represents the base class of a module. A module can be thought of as computation component or computation machine. The job of a module is to execute forward and backward passes. It also updates parameters in a model.

Methods

下表显示 BaseModule class 中包含的方法：

Following table shows the methods consisted in BaseModule class−

Methods

Definition

backward([out_grads])

As name implies this method implements the backward computation.

bind(data_shapes[, label_shapes, …])

It binds the symbols to construct executors and it is necessary before one can perform computation with the module.

fit(train_data[, eval_data, eval_metric, …])

This method trains the module parameters.

forward(data_batch[, is_train])

As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes.

forward_backward(data_batch)

It is a convenient function, as name implies, that calls both forward and backward.

get_input_grads([merge_multi_context])

This method will gets the gradients to the inputs which is computed in the previous backward computation.

get_outputs([merge_multi_context])

As name implies, this method will gets outputs of the previous forward computation.

get_params()

It gets the parameters especially those which are potentially copies of the actual parameters used to do computation on the device.

get_states([merge_multi_context])

init_optimizer([kvstore, optimizer, …])

This method installs and initialize the optimizers. It also initializes kvstore for distribute training.

init_params([initializer, arg_params, …])

As name implies, this method will initialize the parameters and auxiliary states.

install_monitor(mon)

This method will install monitor on all executors.

iter_predict(eval_data[, num_batch, reset, …])

This method will iterate over predictions.

load_params(fname)

It will, as name specifies, load model parameters from file.

predict(eval_data[, num_batch, …])

It will run the prediction and collects the outputs as well.

prepare(data_batch[, sparse_row_id_fn])

The operator prepares the module for processing a given data batch.

save_params(fname)

As name specifies, this function will save the model parameters to file.

score(eval_data, eval_metric[, num_batch, …])

It runs the prediction on eval_data and also evaluates the performance according to the given eval_metric.

set_params(arg_params, aux_params[, …])

This method will assign the parameter and aux state values.

set_states([states, value])

This method, as name implies, sets value for states.

update()

This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels[, pre_sliced])

This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation.

backward([out_grads])

As name implies this method implements the backward computation.

bind(data_shapes[, label_shapes, …])

It set up the buckets and binds the executor for the default bucket key. This method represents the binding for a BucketingModule.

forward(data_batch[, is_train])

As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes.

get_input_grads([merge_multi_context])

This method will get the gradients to the inputs which is computed in the previous backward computation.

get_outputs([merge_multi_context])

As name implies, this method will get outputs from the previous forward computation.

get_params()

It gets the current parameters especially those which are potentially copies of the actual parameters used to do computation on the device.

get_states([merge_multi_context])

This method will get states from all devices.

init_optimizer([kvstore, optimizer, …])

This method installs and initialize the optimizers. It also initializes kvstore for distribute training.

init_params([initializer, arg_params, …])

As name implies, this method will initialize the parameters and auxiliary states.

install_monitor(mon)

This method will install monitor on all executors.

load(prefix, epoch[, sym_gen, …])

This method will create a model from the previously saved checkpoint.

load_dict([sym_dict, sym_gen, …])

This method will create a model from a dictionary (dict) mapping bucket_key to symbols. It also shares arg_params and aux_params.

prepare(data_batch[, sparse_row_id_fn])

The operator prepares the module for processing a given data batch.

save_checkpoint(prefix, epoch[, remove_amp_cast])

This method, as name implies, saves the current progress to the checkpoint for all buckets in BucketingModule. It is recommended to use mx.callback.module_checkpoint as epoch_end_callback to save during training.

set_params(arg_params, aux_params[,…])

As name specifies, this function will assign parameters and aux state values.

set_states([states, value])

This method, as name implies, sets value for states.

switch_bucket(bucket_key, data_shapes[, …])

It will switche to a different bucket.

update()

This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels[, pre_sliced])

This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation.

Attributes

以下表列出 BaseModule 类的各个方法包含的属性 −

Following table shows the attributes consisted in the methods of BaseModule class −

Attributes

Definition

data_names

It consists of the list of names for data required by this module.

data_shapes

It consists of the list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

It shows the list of (name, shape) pairs specifying the label inputs to this module.

output_names

It consists of the list of names for the outputs of this module.

output_shapes

It consists of the list of (name, shape) pairs specifying the outputs of this module.

symbol

As name specified, this attribute gets the symbol associated with this module.

data_shapes：你可以参考 https://mxnet.apache.org 处提供的链接以了解更多详细信息。output_shapes：详细信息

data_shapes: You can refer the link available at https://mxnet.apache.org for details. output_shapes: More

output_shapes：更多信息可在 https://mxnet.apache.org/api/python 找到

output_shapes: More information is available at https://mxnet.apache.org/api/python

BucketingModule(sym_gen[…])

它表示一个模块的 Bucketingmodule 类，它有助于有效地处理长度不一的输入。

It represents the Bucketingmodule class of a Module which helps to deal efficiently with varying length inputs.

Methods

以下表列出 BucketingModule class 中包含的方法 −

Following table shows the methods consisted in BucketingModule class −

Attributes

以下表列出 BaseModule class 的各个方法包含的属性 −

Following table shows the attributes consisted in the methods of BaseModule class −

Attributes

Definition

data_names

It consists of the list of names for data required by this module.

data_shapes

It consists of the list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

It shows the list of (name, shape) pairs specifying the label inputs to this module.

output_names

It consists of the list of names for the outputs of this module.

output_shapes

It consists of the list of (name, shape) pairs specifying the outputs of this module.

Symbol

As name specified, this attribute gets the symbol associated with this module.

data_shapes − 你可以参考 https://mxnet.apache.org/api/python/docs 处的链接了解更多信息。

data_shapes − You can refer the link at https://mxnet.apache.org/api/python/docs for more information.

output_shapes− 你可以参考 https://mxnet.apache.org/api/python/docs 处的链接了解更多信息。

output_shapes− You can refer the link at https://mxnet.apache.org/api/python/docs for more information.

Module(symbol[,data_names, label_names,…])

它表示一个包装 symbol 的基本模块。

It represents a basic module that wrap a symbol.

Methods

以下表列出 Module class 中包含的方法 −

Following table shows the methods consisted in Module class −

Methods

Definition

backward([out_grads])

As name implies this method implements the backward computation.

bind(data_shapes[, label_shapes, …])

It binds the symbols to construct executors and it is necessary before one can perform computation with the module.

borrow_optimizer(shared_module)

As name implies, this method will borrow the optimizer from a shared module.

forward(data_batch[, is_train])

As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes.

get_input_grads([merge_multi_context])

This method will gets the gradients to the inputs which is computed in the previous backward computation.

get_outputs([merge_multi_context])

As name implies, this method will gets outputs of the previous forward computation.

get_params()

It gets the parameters especially those which are potentially copies of the actual parameters used to do computation on the device.

get_states([merge_multi_context])

This method will get states from all devices

init_optimizer([kvstore, optimizer, …])

This method installs and initialize the optimizers. It also initializes kvstore for distribute training.

init_params([initializer, arg_params, …])

As name implies, this method will initialize the parameters and auxiliary states.

install_monitor(mon)

This method will install monitor on all executors.

load(prefix, epoch[, sym_gen, …])

This method will create a model from the previously saved checkpoint.

load_optimizer_states(fname)

This method will load an optimizer i.e. the updater state from a file.

prepare(data_batch[, sparse_row_id_fn])

The operator prepares the module for processing a given data batch.

reshape(data_shapes[, label_shapes])

This method, as name implies, reshape the module for new input shapes.

save_checkpoint(prefix, epoch[, …])

It saves the current progress to checkpoint.

save_optimizer_states(fname)

This method saves the optimizer or the updater state to a file.

set_params(arg_params, aux_params[,…])

As name specifies, this function will assign parameters and aux state values.

set_states([states, value])

This method, as name implies, sets value for states.

update()

This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels[, pre_sliced])

This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation.

Attributes

下表显示了 Module class 方法包含的属性——

Following table shows the attributes consisted in the methods of Module class −

Attributes

Definition

data_names

It consists of the list of names for data required by this module.

data_shapes

It consists of the list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

It shows the list of (name, shape) pairs specifying the label inputs to this module.

output_names

It consists of the list of names for the outputs of this module.

output_shapes

It consists of the list of (name, shape) pairs specifying the outputs of this module.

label_names

It consists of the list of names for labels required by this module.

data_shapes: 访问链接 https://mxnet.apache.org/api/python/docs/api/module 以获取更多详细信息。

data_shapes: Visit the link https://mxnet.apache.org/api/python/docs/api/module for further details.

output_shapes: 此处给出的链接 https://mxnet.apache.org/api/python/docs/api/module/index.html 将提供其他重要信息。

output_shapes: The link given herewith https://mxnet.apache.org/api/python/docs/api/module/index.html will offer other important information.

PythonLossModule([name,data_names,…])

该类的基础是 mxnet.module.python_module.PythonModule. PythonLossModule 类是一个便捷的模块类，它将所有或许多模块 API 实现为空函数。

The base of this class is mxnet.module.python_module.PythonModule. PythonLossModule class is a convenient module class which implements all or many of the module APIs as empty functions.

Methods

下表显示了 PythonLossModule 类中包含的方法:

Following table shows the methods consisted in PythonLossModule class:

Methods

Definition

backward([out_grads])

As name implies this method implements the backward computation.

forward(data_batch[, is_train])

As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes.

get_input_grads([merge_multi_context])

This method will gets the gradients to the inputs which is computed in the previous backward computation.

get_outputs([merge_multi_context])

As name implies, this method will gets outputs of the previous forward computation.

install_monitor(mon)

This method will install monitor on all executors.

PythonModule([data_names,label_names…])

该类的基础是 mxnet.module.base_module.BaseModule。PythonModule 类也是一个便捷的模块类，它将所有或许多模块 API 实现为空函数。

The base of this class is mxnet.module.base_module.BaseModule. PythonModule class also is a convenient module class which implements all or many of the module APIs as empty functions.

Methods

下表显示了 PythonModule 类中包含的方法——

Following table shows the methods consisted in PythonModule class −

Methods

Definition

bind(data_shapes[, label_shapes, …])

It binds the symbols to construct executors and it is necessary before one can perform computation with the module.

get_params()

It gets the parameters especially those which are potentially copies of the actual parameters used to do computation on the device.

init_optimizer([kvstore, optimizer, …])

This method installs and initialize the optimizers. It also initializes kvstore for distribute training.

init_params([initializer, arg_params, …])

As name implies, this method will initialize the parameters and auxiliary states.

update()

This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels[, pre_sliced])

This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation.

Attributes

下表显示了 PythonModule 类中的方法包含的属性——

Following table shows the attributes consisted in the methods of PythonModule class −

Attributes

Definition

data_names

It consists of the list of names for data required by this module.

data_shapes

It consists of the list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

It shows the list of (name, shape) pairs specifying the label inputs to this module.

output_names

It consists of the list of names for the outputs of this module.

output_shapes

It consists of the list of (name, shape) pairs specifying the outputs of this module.

data_shapes - 访问链接 https://mxnet.apache.org 查看详细信息。

data_shapes − Follow the link https://mxnet.apache.org for details.

output_shapes - 访问链接 https://mxnet.apache.org 查看更多详细信息

output_shapes − For more details, visit the link available at https://mxnet.apache.org

SequentialModule([logger])

该类的基础是 mxnet.module.base_module.BaseModule。SequentialModule 类也是一个容器模块，它可以链接多个模块。

The base of this class is mxnet.module.base_module.BaseModule. SequentialModule class also is a container module that can chain more than two (multiple) modules together.

Methods

下表显示了 SequentialModule 类中包含的方法

Following table shows the methods consisted in SequentialModule class

Methods

Definition

add(module, **kwargs)

This is most important function of this class. It adds a module to the chain.

backward([out_grads])

As name implies this method implements the backward computation.

bind(data_shapes[, label_shapes, …])

It binds the symbols to construct executors and it is necessary before one can perform computation with the module.

forward(data_batch[, is_train])

As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes.

get_input_grads([merge_multi_context])

This method will gets the gradients to the inputs which is computed in the previous backward computation.

get_outputs([merge_multi_context])

As name implies, this method will gets outputs of the previous forward computation.

get_params()

It gets the parameters especially those which are potentially copies of the actual parameters used to do computation on the device.

init_optimizer([kvstore, optimizer, …])

This method installs and initialize the optimizers. It also initializes kvstore for distribute training.

init_params([initializer, arg_params, …])

As name implies, this method will initialize the parameters and auxiliary states.

install_monitor(mon)

This method will install monitor on all executors.

update()

This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels[, pre_sliced])

This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation.

Attributes

下表显示了 BaseModule 类的方法中包含的属性 -

Following table shows the attributes consisted in the methods of BaseModule class −

Attributes

Definition

data_names

It consists of the list of names for data required by this module.

data_shapes

It consists of the list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

It shows the list of (name, shape) pairs specifying the label inputs to this module.

output_names

It consists of the list of names for the outputs of this module.

output_shapes

It consists of the list of (name, shape) pairs specifying the outputs of this module.

output_shapes

It consists of the list of (name, shape) pairs specifying the outputs of this module.

data_shapes - 此处给出的链接 https://mxnet.apache.org 将帮助你更详细地理解属性。

data_shapes − The link given herewith https://mxnet.apache.org will help you in understanding the attribute in much detail.

output_shapes - 按照 https://mxnet.apache.org/api 上的链接了解更多详情。

output_shapes − Follow the link available at https://mxnet.apache.org/api for details.

Implementation Examples

在下面的示例中，我们将创建一个 mxnet 模块。

In the example below, we are going create a mxnet module.

import mxnet as mx
input_data = mx.symbol.Variable('input_data')
f_connected1 = mx.symbol.FullyConnected(data, name='f_connected1', num_hidden=128)
activation_1 = mx.symbol.Activation(f_connected1, name='relu1', act_type="relu")
f_connected2 = mx.symbol.FullyConnected(activation_1, name = 'f_connected2', num_hidden = 64)
activation_2 = mx.symbol.Activation(f_connected2, name='relu2',
act_type="relu")
f_connected3 = mx.symbol.FullyConnected(activation_2, name='fc3', num_hidden=10)
out = mx.symbol.SoftmaxOutput(f_connected3, name = 'softmax')
mod = mx.mod.Module(out)
print(out)

Output

输出如下：

The output is mentioned below −

<Symbol softmax>

Example

print(mod)

Output

输出如下所示−

The output is shown below −

<mxnet.module.module.Module object at 0x00000123A9892F28>

在下例中，我们将实现正向计算

In this example below, we will be implementing forward computation

import mxnet as mx
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])
data = mx.sym.Variable('data')
out = data * 2
mod = mx.mod.Module(symbol=out, label_names=None)
mod.bind(data_shapes=[('data', (1, 10))])
mod.init_params()
data1 = [mx.nd.ones((1, 10))]
mod.forward(Batch(data1))
print (mod.get_outputs()[0].asnumpy())

Output

执行以上代码时，应该看到以下输出 −

When you execute the above code, you should see the following output −

[[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]

Example

data2 = [mx.nd.ones((3, 5))]

mod.forward(Batch(data2))
print (mod.get_outputs()[0].asnumpy())

Output

以下是代码的输出 −

Given below is the output of the code −

[[2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2.]]