Apache Mxnet 简明教程

Apache MXNet - Unified Operator API

本章提供有关 Apache MXNet 中统一运算符应用程序编程界面 (API) 的信息。

This chapter provides information about the unified operator application programming interface (API) in Apache MXNet.

SimpleOp

SimpleOp 是一种新的统一运算符 API，可统一不同的调用进程。调用后，它返回到运算符的基本元素。统一运算符专门设计用于一元以及二元运算。这是因为大多数数学运算符都应用于一个或两个操作数，并且更多操作数使与依赖项相关的优化变得有用。

SimpleOp is a new unified operator API which unifies different invoking processes. Once invoked, it returns to the fundamental elements of operators. The unified operator is specially designed for unary as well as binary operations. It is because most of the mathematical operators attend to one or two operands and more operands make the optimization, related to dependency, useful.

我们将使用一个示例了解其 SimpleOp 统一运算符的工作原理。在此示例中，我们将创建一个充当混合损失的 l1 和 l2 损失 smooth l1 loss 的运算符。我们可以按如下所示定义和编写损失 −

We will be understanding its SimpleOp unified operator working with the help of an example. In this example, we will be creating an operator functioning as a smooth l1 loss, which is a mixture of l1 and l2 loss. We can define and write the loss as given below −

loss = outside_weight .* f(inside_weight .* (data - label))
grad = outside_weight .* inside_weight .* f'(inside_weight .* (data - label))

在此，在上述示例中，

Here, in above example,

.* stands for element-wise multiplication
f, f’ is the smooth l1 loss function which we are assuming is in mshadow.

将此特定损失实现为一元或二元运算符似乎是不可能的，但 MXNet 为其用户提供符号执行中的自动微分，这将损失直接简化为 f 和 f’。因此，我们当然可以将此特定损失实现为一元运算符。

It looks impossible to implement this particular loss as a unary or binary operator but MXNet provides its users automatic differentiation in symbolic execution which simplifies the loss to f and f’ directly. That’s why we can certainly implement this particular loss as a unary operator.

Defining Shapes

众所周知，MXNet 的 mshadow library 要求显式内存分配，因此我们需要在进行任何计算之前提供所有数据形状。在定义函数和梯度之前，我们需要提供输入形状一致性并输出形状，如下所示：

As we know MXNet’s mshadow library requires explicit memory allocation hence we need to provide all data shapes before any calculation occurs. Before defining functions and gradient, we need to provide input shape consistency and output shape as follows:

typedef mxnet::TShape (*UnaryShapeFunction)(const mxnet::TShape& src,
const EnvArguments& env);
   typedef mxnet::TShape (*BinaryShapeFunction)(const mxnet::TShape& lhs,
const mxnet::TShape& rhs,
const EnvArguments& env);

函数 mxnet::Tshape 用于检查输入数据形状并指定输出数据形状。如果没有定义此函数，则默认输出形状将与输入形状相同。例如，对于二元运算符，lhs 和 rhs 的形状默认情况下被检查为相同。

The function mxnet::Tshape is used to check input data shape and designated output data shape. In case, if you do not define this function then the default output shape would be same as input shape. For example, in case of binary operator the shape of lhs and rhs is by default checked as the same.

现在转向 smooth l1 loss example. 为此，我们需要在头实现 smooth_l1_unary-inl.h. 中为 XPU 定义一个 XPU 到 cpu 或 gpu。原因是在 smooth_l1_unary.cc 和 smooth_l1_unary.cu. 中重复使用相同的代码。

Now let’s move on to our smooth l1 loss example. For this, we need to define an XPU to cpu or gpu in the header implementation smooth_l1_unary-inl.h. The reason is to reuse the same code in smooth_l1_unary.cc and smooth_l1_unary.cu.

#include <mxnet/operator_util.h>
   #if defined(__CUDACC__)
      #define XPU gpu
   #else
      #define XPU cpu
#endif

由于在我们的 smooth l1 loss example, 中输出与源形状相同，因此我们可以使用默认行为。可以写成如下形式 −

As in our smooth l1 loss example, the output has the same shape as the source, we can use the default behavior. It can be written as follows −

inline mxnet::TShape SmoothL1Shape_(const mxnet::TShape& src,const EnvArguments& env) {
   return mxnet::TShape(src);
}

Defining Functions

我们可以使用一个输入创建一元或二元函数，如下所示 −

We can create a unary or binary function with one input as follows −

typedef void (*UnaryFunction)(const TBlob& src,
   const EnvArguments& env,
   TBlob* ret,
   OpReqType req,
   RunContext ctx);
typedef void (*BinaryFunction)(const TBlob& lhs,
   const TBlob& rhs,
   const EnvArguments& env,
   TBlob* ret,
   OpReqType req,
   RunContext ctx);

以下是包含运行时执行所需信息的 RunContext ctx struct −

Following is the RunContext ctx struct which contains the information needed during runtime for execution −

struct RunContext {
   void *stream; // the stream of the device, can be NULL or Stream<gpu>* in GPU mode
   template<typename xpu> inline mshadow::Stream<xpu>* get_stream() // get mshadow stream from Context
} // namespace mxnet

现在，让我们看看如何在 ret 中编写计算结果。

Now, let’s see how we can write the computation results in ret.

enum OpReqType {
   kNullOp, // no operation, do not write anything
   kWriteTo, // write gradient to provided space
   kWriteInplace, // perform an in-place write
   kAddTo // add to the provided space
};

现在，让我们继续我们的 smooth l1 loss example 。为此，我们将使用 UnaryFunction 来定义此运算符的功能，如下所示：

Now, let’s move on to our smooth l1 loss example. For this, we will use UnaryFunction to define the function of this operator as follows:

template<typename xpu>
void SmoothL1Forward_(const TBlob& src,
   const EnvArguments& env,
   TBlob *ret,
   OpReqType req,
RunContext ctx) {
   using namespace mshadow;
   using namespace mshadow::expr;
   mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
   real_t sigma2 = env.scalar * env.scalar;
   MSHADOW_TYPE_SWITCH(ret->type_flag_, DType, {
      mshadow::Tensor<xpu, 2, DType> out = ret->get<xpu, 2, DType>(s);
      mshadow::Tensor<xpu, 2, DType> in = src.get<xpu, 2, DType>(s);
      ASSIGN_DISPATCH(out, req,
      F<mshadow_op::smooth_l1_loss>(in, ScalarExp<DType>(sigma2)));
   });
}

Defining Gradients

除了 Input, TBlob, 和 OpReqType 加倍，二元运算符的梯度函数具有相似的结构。让我们在下面查看，我们在其中创建了具有各种类型输入的梯度函数：

Except Input, TBlob, and OpReqType are doubled, Gradients functions of binary operators have similar structure. Let’s check out below, where we created a gradient function with various types of input:

// depending only on out_grad
typedef void (*UnaryGradFunctionT0)(const OutputGrad& out_grad,
   const EnvArguments& env,
   TBlob* in_grad,
   OpReqType req,
   RunContext ctx);
// depending only on out_value
typedef void (*UnaryGradFunctionT1)(const OutputGrad& out_grad,
   const OutputValue& out_value,
   const EnvArguments& env,
   TBlob* in_grad,
   OpReqType req,
   RunContext ctx);
// depending only on in_data
typedef void (*UnaryGradFunctionT2)(const OutputGrad& out_grad,
   const Input0& in_data0,
   const EnvArguments& env,
   TBlob* in_grad,
   OpReqType req,
   RunContext ctx);

如上所述 Input0, Input, OutputValue, 和 OutputGrad *all share the structure of *GradientFunctionArgument. 定义如下 −

As defined above Input0, Input, OutputValue, and OutputGrad *all share the structure of *GradientFunctionArgument. It is defined as follows −

struct GradFunctionArgument {
   TBlob data;
}

现在让我们继续我们的 smooth l1 loss example 。为了启用梯度的链式法则，我们需要将 out_grad 从上乘到 in_grad 的结果。

Now let’s move on to our smooth l1 loss example. For this to enable the chain rule of gradient we need to multiply out_grad from the top to the result of in_grad.

template<typename xpu>
void SmoothL1BackwardUseIn_(const OutputGrad& out_grad, const Input0& in_data0,
   const EnvArguments& env,
   TBlob *in_grad,
   OpReqType req,
   RunContext ctx) {
   using namespace mshadow;
   using namespace mshadow::expr;
   mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
   real_t sigma2 = env.scalar * env.scalar;
      MSHADOW_TYPE_SWITCH(in_grad->type_flag_, DType, {
      mshadow::Tensor<xpu, 2, DType> src = in_data0.data.get<xpu, 2, DType>(s);
      mshadow::Tensor<xpu, 2, DType> ograd = out_grad.data.get<xpu, 2, DType>(s);
      mshadow::Tensor<xpu, 2, DType> igrad = in_grad->get<xpu, 2, DType>(s);
      ASSIGN_DISPATCH(igrad, req,
      ograd * F<mshadow_op::smooth_l1_gradient>(src, ScalarExp<DType>(sigma2)));
   });
}

Register SimpleOp to MXNet

一旦我们创建了形状、函数和梯度，我们就需要将它们还原为 NDArray 运算符和符号运算符。为此，我们可以使用如下所示的注册宏：

Once we created the shape, function, and gradient, we need to restore them into both an NDArray operator as well as into a symbolic operator. For this, we can use the registration macro as follows −

MXNET_REGISTER_SIMPLE_OP(Name, DEV)
   .set_shape_function(Shape)
   .set_function(DEV::kDevMask, Function<XPU>, SimpleOpInplaceOption)
   .set_gradient(DEV::kDevMask, Gradient<XPU>, SimpleOpInplaceOption)
   .describe("description");

SimpleOpInplaceOption 可以定义如下 −

The SimpleOpInplaceOption can be defined as follows −

enum SimpleOpInplaceOption {
   kNoInplace, // do not allow inplace in arguments
   kInplaceInOut, // allow inplace in with out (unary)
   kInplaceOutIn, // allow inplace out_grad with in_grad (unary)
   kInplaceLhsOut, // allow inplace left operand with out (binary)

   kInplaceOutLhs // allow inplace out_grad with lhs_grad (binary)
};

现在让我们继续我们的 smooth l1 loss example 。为此，我们有一个依赖于输入数据的梯度函数，因此无法在原地编写该函数。

Now let’s move on to our smooth l1 loss example. For this, we have a gradient function that relies on input data so that the function cannot be written in place.

MXNET_REGISTER_SIMPLE_OP(smooth_l1, XPU)
.set_function(XPU::kDevMask, SmoothL1Forward_<XPU>, kNoInplace)
.set_gradient(XPU::kDevMask, SmoothL1BackwardUseIn_<XPU>, kInplaceOutIn)
.set_enable_scalar(true)
.describe("Calculate Smooth L1 Loss(lhs, scalar)");

SimpleOp on EnvArguments

正如我们所知，某些操作可能需要以下功能 −

As we know some operations might need the following −

A scalar as input such as a gradient scale
A set of keyword arguments controlling behavior
A temporary space to speed up calculations.

使用 EnvArguments 的好处是它提供了额外的参数和资源，使计算更具可扩展性和效率。

The benefit of using EnvArguments is that it provides additional arguments and resources to make calculations more scalable and efficient.

Example

首先让我们定义以下结构：

First let’s define the struct as below −

struct EnvArguments {
   real_t scalar; // scalar argument, if enabled
   std::vector<std::pair<std::string, std::string> > kwargs; // keyword arguments
   std::vector<Resource> resource; // pointer to the resources requested
};

接下来，我们需要从 EnvArguments.resource. 中请求额外的资源（如 mshadow::Random<xpu> ）和临时内存空间。可以如下所示完成：

Next, we need to request additional resources like mshadow::Random<xpu> and temporary memory space from EnvArguments.resource. It can be done as follows −

struct ResourceRequest {
   enum Type { // Resource type, indicating what the pointer type is
      kRandom, // mshadow::Random<xpu> object
      kTempSpace // A dynamic temp space that can be arbitrary size
   };
   Type type; // type of resources
};

现在，注册将从 mxnet::ResourceManager. 请求已声明的资源请求。之后，它将把资源放在 std::vector<Resource> resource in EnvAgruments. 中

Now, the registration will request the declared resource request from mxnet::ResourceManager. After that, it will place the resources in std::vector<Resource> resource in EnvAgruments.

我们可以借助以下代码访问资源 −

We can access the resources with the help of following code −

auto tmp_space_res = env.resources[0].get_space(some_shape, some_stream);
auto rand_res = env.resources[0].get_random(some_stream);

如果您在我们的平滑 l1 损失示例中看到，需要一个标量输入来标记损失函数的转折点。这就是为什么在注册过程中，我们在函数和梯度声明中使用 set_enable_scalar(true) 和 env.scalar 。

If you see in our smooth l1 loss example, a scalar input is needed to mark the turning point of a loss function. That’s why in the registration process, we use set_enable_scalar(true), and env.scalar in function and gradient declarations.

Building Tensor Operation

在这里提出了一个问题，为什么我们需要构建张量运算？原因如下−

Here the question arises that why we need to craft tensor operations? The reasons are as follows −

Computation utilizes the mshadow library and we sometimes do not have functions readily available.
If an operation is not done in an element-wise way such as softmax loss and gradient.

Example

在这里，我们使用上述的平滑 l1 损失示例。我们将创建两个映射器，即平滑 l1 损失和梯度的标量案例：

Here, we are using the above smooth l1 loss example. We will be creating two mappers namely the scalar cases of smooth l1 loss and gradient:

namespace mshadow_op {
   struct smooth_l1_loss {
      // a is x, b is sigma2
      MSHADOW_XINLINE static real_t Map(real_t a, real_t b) {
         if (a > 1.0f / b) {
            return a - 0.5f / b;
         } else if (a < -1.0f / b) {
            return -a - 0.5f / b;
         } else {
            return 0.5f * a * a * b;
         }
      }
   };
}