Apache Mxnet 简明教程
Apache MXNet - System Components
此处详细介绍了 Apache MXNet 中的系统组件。首先,我们将研究 MXNet 中的执行引擎。
Here, the system components in Apache MXNet are explained in detail. First, we will study about the execution engine in MXNet.
Execution Engine
Apache MXNet 的执行引擎非常通用。它可以用深度学习或任何特定领域问题:执行一些函数,同时遵循这些函数的依赖关系。它的设计方式使依赖函数序列化,而没有依赖关系的函数可以并行执行。
Apache MXNet’s execution engine is very versatile. We can use it for deep learning as well as any domain-specific problem: execute a bunch of functions following their dependencies. It is designed in such a way that the functions with dependencies are serialized whereas, the functions with no dependencies can be executed in parallel.
Core Interface
下面给出的 API 是 Apache MXNet 执行引擎的核心接口:
The API given below is the core interface for Apache MXNet’s execution engine −
virtual void PushSync(Fn exec_fun, Context exec_ctx,
std::vector<VarHandle> const& const_vars,
std::vector<VarHandle> const& mutate_vars) = 0;
以上 API 具有以下内容:
The above API has the following −
-
exec_fun − The core interface API of MXNet allows us to push the function named exec_fun, along with its context information and dependencies, to the execution engine.
-
exec_ctx − The context information in which the above-mentioned function exec_fun should be executed.
-
const_vars − These are the variables that the function reads from.
-
mutate_vars − These are the variables that are to be modified.
执行引擎向其用户保证以按顺序推入的方式对修改通用变量的两个函数的执行进行序列化。
The execution engine provides its user the guarantee that the execution of any two functions that modify a common variable is serialized in their push order.
Function
以下为 Apache MXNet 执行引擎的函数类型:
Following is the function type of the execution engine of Apache MXNet −
using Fn = std::function<void(RunContext)>;
在上述函数中, RunContext 包含运行时信息。运行时信息应由执行引擎来确定。 RunContext 的句法如下:
In the above function, RunContext contains the runtime information. The runtime information should be determined by the execution engine. The syntax of RunContext is as follows−
struct RunContext {
// stream pointer which could be safely cast to
// cudaStream_t* type
void *stream;
};
以下列出有关执行引擎函数的一些重要提示:
Below are given some important points about execution engine’s functions −
-
All the functions are executed by MXNet’s execution engine’s internal threads.
-
It is not good to push blocking the function to the execution engine because with that the function will occupy the execution thread and will also reduce the total throughput.
为此,MXNet 提供了另一个异步函数,如下所示:
For this MXNet provides another asynchronous function as follows−
using Callback = std::function<void()>;
using AsyncFn = std::function<void(RunContext, Callback)>;
-
In this AsyncFn function we can pass the heavy part of our threads, but the execution engine does not consider the function finished until we call the callback function.
Context
在 Context 中,我们可以指定在其中执行函数的上下文。这通常包括以下内容:
In Context, we can specify the context of the function to be executed within. This usually includes the following −
-
Whether the function should be run on a CPU or a GPU.
-
If we specify GPU in the Context, then which GPU to use.
-
There is a huge difference between Context and RunContext. Context have the device type and device id, whereas RunContext have the information that can be decided only during runtime.
VarHandle
用于指定函数依赖关系的 VarHandle 就像一个标记(特别是由执行引擎提供的),它可用于表示函数可以修改或使用的外部资源。
VarHandle, used to specify the dependencies of functions, is like a token (especially provided by execution engine) we can use to represents the external resources the function can modify or use.
但出现了问题,为什么我们需要使用 VarHandle?这是因为,Apache MXNet 引擎被设计为与其他 MXNet 模块分离。
But the question arises, why we need to use VarHandle? It is because, the Apache MXNet engine is designed to decoupled from other MXNet modules.
以下是有关 VarHandle 的一些重要要点:
Following are some important points about VarHandle −
-
It is lightweight so to create, delete, or copying a variable incurs little operating cost.
-
We need to specify the immutable variables i.e. the variables that will be used in the const_vars.
-
We need to specify the mutable variables i.e. the variables that will be modified in the mutate_vars.
-
The rule used by the execution engine to resolve the dependencies among functions is that the execution of any two functions when one of them modifies at least one common variable is serialized in their push order.
-
For creating a new variable, we can use the NewVar() API.
-
For deleting a variable, we can use the PushDelete API.
让我们通过一个简单的示例了解它的工作原理:
Let us understand its working with a simple example −
假设我们有两个函数,分别称为 F1 和 F2,并且它们都更改了变量 V2。在这种情况下,如果 F2 在 F1 之后被推送,则保证 F2 在 F1 之后执行。另一方面,如果 F1 和 F2 都使用 V2,则它们实际的执行顺序可能是随机的。
Suppose if we have two functions namely F1 and F2 and they both mutate the variable namely V2. In that case, F2 is guaranteed to be executed after F1 if F2 is pushed after F1. On the other side, if F1 and F2 both use V2 then their actual execution order could be random.
Push and Wait
Push 和 wait 是执行引擎中另外两个有用的 API。
Push and wait are two more useful API of execution engine.
以下是 Push API 的两个重要特性:
Following are two important features of Push API:
-
All the Push APIs are asynchronous which means that the API call immediately returns regardless of whether the pushed function is finished or not.
-
Push API is not thread safe which means that only one thread should make engine API calls at a time.
现在如果我们讨论 Wait API,以下几点代表它 −
Now if we talk about Wait API, following points represent it −
-
If a user wants to wait for a specific function to be finished, he/she should include a callback function in the closure. Once included, call the function at the end of the function.
-
On the other hand, if a user wants to wait for all functions that involves a certain variable to finish, he/she should use WaitForVar(var) API.
-
If someone wants to wait for all the pushed functions to finish, then use the WaitForAll () API.
-
Used to specify the dependencies of functions, is like a token.
Operators
Apache MXNet 中的运算符是一个包含实际计算逻辑以及辅助信息,并帮助系统执行优化的类。
Operator in Apache MXNet is a class that contains actual computation logic as well as auxiliary information and aid the system in performing optimisation.
Operator Interface
Forward 是核心运算符接口,其语法如下:
Forward is the core operator interface whose syntax is as follows:
virtual void Forward(const OpContext &ctx,
const std::vector<TBlob> &in_data,
const std::vector<OpReqType> &req,
const std::vector<TBlob> &out_data,
const std::vector<TBlob> &aux_states) = 0;
定义在 Forward() 中的 OpContext 的结构如下:
The structure of OpContext, defined in Forward() is as follows:
struct OpContext {
int is_train;
RunContext run_ctx;
std::vector<Resource> requested;
}
OpContext 描述了运算符的状态(是否在训练或测试阶段),运算符应该在哪个设备上运行,以及请求的资源。执行引擎的两个更有用的 API。
The OpContext describes the state of operator (whether in the train or test phase), which device the operator should be run on and also the requested resources. two more useful API of execution engine.
从上述 Forward 的核心接口,我们可以理解请求的资源如下 −
From the above Forward core interface, we can understand the requested resources as follows −
-
in_data and out_data represent the input and output tensors.
-
req denotes how the result of computation are written into the out_data.
OpReqType 可以定义为 −
The OpReqType can be defined as −
enum OpReqType {
kNullOp,
kWriteTo,
kWriteInplace,
kAddTo
};
就像 Forward 运算符一样,我们可以选择实现 Backward 接口,如下所示:
As like Forward operator, we can optionally implement the Backward interface as follows −
virtual void Backward(const OpContext &ctx,
const std::vector<TBlob> &out_grad,
const std::vector<TBlob> &in_data,
const std::vector<TBlob> &out_data,
const std::vector<OpReqType> &req,
const std::vector<TBlob> &in_grad,
const std::vector<TBlob> &aux_states);
Various tasks
Operator 接口允许用户执行以下任务 −
Operator interface allows the users to do the following tasks −
-
User can specify in-place updates and can reduce memory allocation cost
-
In order to make it cleaner, the user can hide some internal arguments from Python.
-
User can define the relationship among the tensors and output tensors.
-
To perform computation, the user can acquire additional temporary space from the system.
Operator Property
我们知道在卷积神经网络 (CNN) 中,一个卷积有多种实现。为了从中获得最佳性能,我们可能希望在这些卷积之中进行切换。
As we are aware that in Convolutional neural network (CNN), one convolution has several implementations. To achieve the best performance from them, we might want to switch among those several convolutions.
这就是 Apache MXNet 将算子语义接口从实现接口中分离出来的原因。此分离以以下形式完成: OperatorProperty 类,它包含以下内容:−
That is the reason, Apache MXNet separate the operator semantic interface from the implementation interface. This separation is done in the form of OperatorProperty class which consists of the following−
InferShape - InferShape 接口有两个目的,如下所示:
InferShape − The InferShape interface has two purposes as given below:
-
First purpose is to tell the system the size of each input and output tensor so that the space can be allocated before Forward and Backward call.
-
Second purpose is to perform a size check to make sure that there is no error before running.
语法如下所示:−
The syntax is given below −
virtual bool InferShape(mxnet::ShapeVector *in_shape,
mxnet::ShapeVector *out_shape,
mxnet::ShapeVector *aux_shape) const = 0;
Request Resource - 如果您的系统可以管理像 cudnnConvolutionForward 这样的操作的计算工作空间会怎么样?您的系统可以执行优化,例如重用空间和更多内容。在这里,MXNet 在以下两个接口的帮助下轻松实现了这一点−
Request Resource − What if your system can manage the computation workspace for operations like cudnnConvolutionForward? Your system can perform optimizations such as reuse the space and many more. Here, MXNet easily achieve this with the help of following two interfaces−
virtual std::vector<ResourceRequest> ForwardResource(
const mxnet::ShapeVector &in_shape) const;
virtual std::vector<ResourceRequest> BackwardResource(
const mxnet::ShapeVector &in_shape) const;
但是,如果 ForwardResource 和 BackwardResource 返回非空数组会怎么样?在这种情况下,系统通过 Forward 和 Backward 接口的 ctx 参数提供相应的资源 Operator 。
But, what if the ForwardResource and BackwardResource return non-empty arrays? In that case, the system offers corresponding resources through ctx parameter in the Forward and Backward interface of Operator.
Backward dependency - Apache MXNet 具有以下两个不同的运算符签名来处理向后依赖:
Backward dependency − Apache MXNet has following two different operator signatures to deal with backward dependency −
void FullyConnectedForward(TBlob weight, TBlob in_data, TBlob out_data);
void FullyConnectedBackward(TBlob weight, TBlob in_data, TBlob out_grad, TBlob in_grad);
void PoolingForward(TBlob in_data, TBlob out_data);
void PoolingBackward(TBlob in_data, TBlob out_data, TBlob out_grad, TBlob in_grad);
在这里,需要注意的两个重要点:
Here, the two important points to note −
-
The out_data in FullyConnectedForward is not used by FullyConnectedBackward, and
-
PoolingBackward requires all the arguments of PoolingForward.
这就是为什么对于 FullyConnectedForward 来说,一旦消耗了 out_data 张量,就可以安全地释放它,因为后向函数不需要它。在此系统的帮助下,可以尽早收集一些张量作为垃圾。
That is why for FullyConnectedForward, the out_data tensor once consumed could be safely freed because the backward function will not need it. With the help of this system got a to collect some tensors as garbage as early as possible.
In place Option - Apache MXNet 为用户提供了另一个接口来节省内存分配的成本。此接口适用于输入和输出张量具有相同形状的逐元素运算。
In place Option − Apache MXNet provides another interface to the users to save the cost of memory allocation. The interface is appropriate for element-wise operations in which both input and output tensors have the same shape.
以下是指定就地更新的语法:
Following is the syntax for specifying the in-place update −
Example for Creating an Operator
借助 OperatorProperty,我们可以创建一个运算符。为此,请执行以下步骤:
With the help of OperatorProperty we can create an operator. To do so, follow the steps given below −
virtual std::vector<std::pair<int, void*>> ElewiseOpProperty::ForwardInplaceOption(
const std::vector<int> &in_data,
const std::vector<void*> &out_data)
const {
return { {in_data[0], out_data[0]} };
}
virtual std::vector<std::pair<int, void*>> ElewiseOpProperty::BackwardInplaceOption(
const std::vector<int> &out_grad,
const std::vector<int> &in_data,
const std::vector<int> &out_data,
const std::vector<void*> &in_grad)
const {
return { {out_grad[0], in_grad[0]} }
}
Step 1
Create Operator
Create Operator
首先在 OperatorProperty 中实现以下接口:
First implement the following interface in OperatorProperty:
virtual Operator* CreateOperator(Context ctx) const = 0;
示例如下:
The example is given below −
class ConvolutionOp {
public:
void Forward( ... ) { ... }
void Backward( ... ) { ... }
};
class ConvolutionOpProperty : public OperatorProperty {
public:
Operator* CreateOperator(Context ctx) const {
return new ConvolutionOp;
}
};
Step 2
Parameterize Operator
Parameterize Operator
如果你要实施一个卷积运算符,必须知道核大小、步幅大小、填充大小等。因为在调用任何 Forward 或 backward 接口之前,应将这些参数传递给运算符。
If you are going to implement a convolution operator, it is mandatory to know the kernel size, the stride size, padding size, and so on. Why, because these parameters should be passed to the operator before calling any Forward or backward interface.
为此,我们需要定义一个 ConvolutionParam 结构,如下所示 −
For this, we need to define a ConvolutionParam structure as below −
#include <dmlc/parameter.h>
struct ConvolutionParam : public dmlc::Parameter<ConvolutionParam> {
mxnet::TShape kernel, stride, pad;
uint32_t num_filter, num_group, workspace;
bool no_bias;
};
现在,我们需要将其放入 ConvolutionOpProperty 中,并按照以下方式将其传递给运算符 −
Now, we need to put this in ConvolutionOpProperty and pass it to the operator as follows −
class ConvolutionOp {
public:
ConvolutionOp(ConvolutionParam p): param_(p) {}
void Forward( ... ) { ... }
void Backward( ... ) { ... }
private:
ConvolutionParam param_;
};
class ConvolutionOpProperty : public OperatorProperty {
public:
void Init(const vector<pair<string, string>& kwargs) {
// initialize param_ using kwargs
}
Operator* CreateOperator(Context ctx) const {
return new ConvolutionOp(param_);
}
private:
ConvolutionParam param_;
};
Step 3
Register the Operator Property Class and the Parameter Class to Apache MXNet
Register the Operator Property Class and the Parameter Class to Apache MXNet
最后,我们需要将运算符属性类和参数类注册到 MXNet。可以使用以下宏来完成此操作 −
At last, we need to register the Operator Property Class and the Parameter Class to MXNet. It can be done with the help of following macros −
DMLC_REGISTER_PARAMETER(ConvolutionParam);
MXNET_REGISTER_OP_PROPERTY(Convolution, ConvolutionOpProperty);
在上述宏中,第一个参数是名称字符串,第二个参数是属性类名称。
In the above macro, the first argument is the name string and the second is the property class name.