Microsoft Cognitive Toolkit 简明教程

CNTK - Neural Network (NN) Concepts

本章介绍了有关 CNTK 的神经网络的概念。

This chapter deals with concepts of Neural Network with regards to CNTK.

正如我们所知,神经网络需要使用几层神经元。但是,CNTK 中可以用什么来对 NN 的层进行建模呢?这个问题的答案是 layer 模块中定义的 layer 函数。

As we know that, several layers of neurons are used for making a neural network. But, the question arises that in CNTK how we can model the layers of a NN? It can be done with the help of layer functions defined in the layer module.

Layer function

事实上,在 CNTK 中,使用 layer 时会有一种明确的函数式编程感觉。Layer 函数看起来就像一个普通函数,它会生成具有预定义参数集的数学函数。让我们借助 layer 函数来了解如何创建最基本类型的层,即 Dense。

Actually, in CNTK, working with the layers has a distinct functional programming feel to it. Layer function looks like a regular function and it produces a mathematical function with a set of predefined parameters. Let’s see how we can create the most basic layer type, Dense, with the help of layer function.

Example

我们可以借助以下基本步骤创建最基本类型的层:

With the help of following basic steps, we can create the most basic layer type −

Step 1 - 首先,我们需要从 CNTK 的 layer 包中导入 Dense layer 函数。

Step 1 − First, we need to import the Dense layer function from the layers’ package of CNTK.

from cntk.layers import Dense

Step 2 - 接下来需要从 CNTK 根包中导入 input_variable 函数。

Step 2 − Next from the CNTK root package, we need to import the input_variable function.

from cntk import input_variable

Step 3 - 现在,我们需要使用 input_variable 函数创建一个新的输入变量。我们还需要提供它的 size。

Step 3 − Now, we need to create a new input variable using the input_variable function. We also need to provide the its size.

feature = input_variable(100)

Step 4 - 最后,我们将使用 Dense 函数创建一个新层,同时提供希望的神经元数量。

Step 4 − At last, we will create a new layer using Dense function along with providing the number of neurons we want.

layer = Dense(40)(feature)

现在,我们可以调用已配置的 Dense layer 函数,将 Dense layer 连接到输入。

Now, we can invoke the configured Dense layer function to connect the Dense layer to the input.

Complete implementation example

from cntk.layers import Dense
from cntk import input_variable
feature= input_variable(100)
layer = Dense(40)(feature)

Customizing layers

正如我们所见,CNTK 为我们提供了一组相当不错的默认值来构建 NN。基于 activation 函数和其他我们选择的设置,NN 的行为和性能是不同的。这是另一种非常有用的词干算法。因此,最好了解我们可以配置的内容。

As we have seen CNTK provides us with a pretty good set of defaults for building NNs. Based on activation function and other settings we choose, the behavior as well as performance of the NN is different. It is another very useful stemming algorithm. That’s the reason, it is good to understand what we can configure.

Steps to configure a Dense layer

NN 中的每一层都有其独特的配置选项,当我们讨论 Dense layer 时,有以下几个重要设置需要定义:

Each layer in NN has its unique configuration options and when we talk about Dense layer, we have following important settings to define −

  1. shape − As name implies, it defines the output shape of the layer which further determines the number of neurons in that layer.

  2. activation − It defines the activation function of that layer, so it can transform the input data.

  3. init − It defines the initialisation function of that layer. It will initialise the parameters of the layer when we start training the NN.

让我们借助以下步骤了解如何配置 Dense 层:

Let’s see the steps with the help of which we can configure a Dense layer −

Step1 - 首先,我们需要从 CNTK 的 layer 包中导入 Dense layer 函数。

Step1 − First, we need to import the Dense layer function from the layers’ package of CNTK.

from cntk.layers import Dense

Step2 * − Next from the CNTK ops package, we need to import the *sigmoid operator 。它将被配置为激活函数。

Step2 * − Next from the CNTK ops package, we need to import the *sigmoid operator. It will be used to configure as an activation function.

from cntk.ops import sigmoid

Step3 * − Now, from initializer package, we need to import the *glorot_uniform 初始化器。

Step3 * − Now, from initializer package, we need to import the *glorot_uniform initializer.

from cntk.initializer import glorot_uniform

Step4 * − At last, we will create a new layer using Dense function along with providing the number of neurons as the first argument. Also, provide the *sigmoid 运算符作为层函数 activationglorot_uniform 作为层函数 init

Step4 * − At last, we will create a new layer using Dense function along with providing the number of neurons as the first argument. Also, provide the *sigmoid operator as activation function and the glorot_uniform as the init function for the layer.

layer = Dense(50, activation = sigmoid, init = glorot_uniform)

Complete implementation example −

from cntk.layers import Dense
from cntk.ops import sigmoid
from cntk.initializer import glorot_uniform
layer = Dense(50, activation = sigmoid, init = glorot_uniform)

Optimizing the parameters

到目前为止,我们已经了解了如何创建 NN 的结构以及如何配置各种设置。在此,我们将了解如何优化 NN 的参数。借助两个组成部分 learnerstrainers 的组合,我们可以优化 NN 的参数。

Till now, we have seen how to create the structure of a NN and how to configure various settings. Here, we will see, how we can optimise the parameters of a NN. With the help of the combination of two components namely learners and trainers, we can optimise the parameters of a NN.

trainer component

用于优化神经网络参数的第一个组件为 trainer 组件。基本上实现反向传播过程。如果谈论其作用,它将数据通过神经网络进行传递以获得预测结果。

The first component which is used to optimise the parameters of a NN is trainer component. It basically implements the backpropagation process. If we talk about its working, it passes the data through the NN to obtain a prediction.

之后,它会使用名为学习器的另一个组件,以获得神经网络中参数的新值。一旦获得新值后,它会应用这些新值并重复此过程,直至满足退出条件。

After that, it uses another component called learner in order to obtain the new values for the parameters in a NN. Once it obtains the new values, it applies these new values and repeat the process until an exit criterion is met.

learner component

用于优化神经网络参数的第二个组件为 learner 组件,它主要负责执行梯度下降算法。

The second component which is used to optimise the parameters of a NN is learner component, which is basically responsible for performing the gradient descent algorithm.

Learners included in the CNTK library

以下是 CNTK 库中提供的一些有趣的学习器的列表 −

Following is the list of some of the interesting learners included in CNTK library −

  1. Stochastic Gradient Descent (SGD) − This learner represents the basic stochastic gradient descent, without any extras.

  2. Momentum Stochastic Gradient Descent (MomentumSGD) − With SGD, this learner applies the momentum to overcome the problem of local maxima.

  3. RMSProp − This learner, in order to control the rate of descent, uses decaying learning rates.

  4. Adam − This learner, in order to decrease the rate of descent over time, uses decaying momentum.

  5. Adagrad − This learner, for frequently as well as infrequently occurring features, uses different learning rates.