Microsoft Cognitive Toolkit 简明教程
CNTK - Creating First Neural Network
本章将详细介绍在 CNTK 中创建神经网络。
This chapter will elaborate on creating a neural network in CNTK.
Build the network structure
为了将 CNTK 概念应用到构建我们第一个的神经网络,我们准备使用神经网络根据花萼宽度和长度以及花瓣宽度和长度的物理属性对鸢尾科花卉的种类进行分类。我们将使用的数据集是描述不同品种鸢尾科花的物理属性的鸢尾数据集 -
In order to apply CNTK concepts to build our first NN, we are going to use NN to classify species of iris flowers based on the physical properties of sepal width and length, and petal width and length. The dataset which we will be using iris dataset that describes the physical properties of different varieties of iris flowers −
-
Sepal length
-
Sepal width
-
Petal length
-
Petal width
-
Class i.e. iris setosa or iris versicolor or iris virginica
此处,我们将构建一种称为前馈神经网络常规神经网络。让我们来看看构建神经网络结构的实现步骤 -
Here, we will be building a regular NN called a feedforward NN. Let us see the implementation steps to build the structure of NN −
Step 1 - 首先,我们将从 CNTK 库导入必要组件,如层类型、激活函数和一个允许我们在神经网络中定义输入变量的函数。
Step 1 − First, we will import the necessary components such as our layer types, activation functions, and a function that allows us to define an input variable for our NN, from CNTK library.
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu
Step 2 - 接下来,我们将使用顺序函数创建模型。创建后,我们将根据需要将其输入到层中。在此,我们准备在神经网络中创建两个不同层;一层有 4 个神经元,另一层有 3 个神经元。
Step 2 − After that, we will create our model using sequential function. Once created, we will feed it with the layers we want. Here, we are going to create two distinct layers in our NN; one with four neurons and another with three neurons.
model = Sequential([Dense(4, activation=relu), Dense(3, activation=log_sogtmax)])
Step 3 - 最后,为了编译神经网络,我们将把网络绑定到输入变量。它将有一个包含 4 个神经元的输入层和一个包含 3 个神经元的输出层。
Step 3 − At last, in order to compile the NN, we will bind the network to the input variable. It has an input layer with four neurons and an output layer with three neurons.
feature= input_variable(4)
z = model(feature)
Applying an activation function
有很多激活函数可以选用,而选择正确的激活函数肯定将对我们深度学习模型的执行效果产生重要影响。
There are lots of activation functions to choose from and choosing the right activation function will definitely make a big difference to how well our deep learning model will perform.
At the output layer
在输出层选择 activation 函数将取决于我们准备用模型解决的问题类型。
Choosing an activation function at the output layer will depend upon the kind of problem we are going to solve with our model.
-
For a regression problem, we should use a linear activation function on the output layer.
-
For a binary classification problem, we should use a sigmoid activation function on the output layer.
-
For multi-class classification problem, we should use a softmax activation function on the output layer.
-
Here, we are going to build a model for predicting one of the three classes. It means we need to use softmax activation function at output layer.
At the hidden layer
在隐藏层选择 activation 函数需要进行一些实验以监测性能,以查看哪个激活函数效果最好。
Choosing an activation function at the hidden layer requires some experimentation for monitoring the performance to see which activation function works well.
-
In a classification problem, we need to predict the probability a sample belongs to a specific class. That’s why we need an activation function that gives us probabilistic values. To reach this goal, sigmoid activation function can help us.
-
One of the major problems associated with sigmoid function is vanishing gradient problem. To overcome such problem, we can use ReLU activation function that coverts all negative values to zero and works as a pass-through filter for positive values.
Picking a loss function
首先,我们获得了神经网络模型的结构,我们必须对其进行优化。对于优化,我们需要一个 loss function 。与 activation functions 不同,我们的选择损失函数非常少。然而,选择损失函数将取决于我们的模型解决的问题类型。
Once, we have the structure for our NN model, we must have to optimise it. For optimising we need a loss function. Unlike activation functions, we have very less loss functions to choose from. However, choosing a loss function will depend upon the kind of problem we are going to solve with our model.
例如,在分类问题中,我们应该使用能够衡量预测类和实际类之间的差别的损失函数。
For example, in a classification problem, we should use a loss function that can measure the difference between a predicted class and an actual class.
loss function
对于分类问题,我们将使用我们的神经网络模型求解, categorical cross entropy 损失函数是最佳选择。在CNTK中,它被实现为 cross_entropy_with_softmax ,可以从 cntk.losses 包中导入,如下所示−
For the classification problem, we are going to solve with our NN model, categorical cross entropy loss function is the best candidate. In CNTK, it is implemented as cross_entropy_with_softmax which can be imported from cntk.losses package, as follows−
label= input_variable(3)
loss = cross_entropy_with_softmax(z, label)
Metrics
有了我们神经网络模型的结构和要应用的损失函数,我们已经具备了制作深度学习模型优化配方所需的所有要素。但是,在深入研究之前,我们应该了解指标。
With having the structure for our NN model and a loss function to apply, we have all the ingredients to start making the recipe for optimising our deep learning model. But, before getting deep dive into this, we should learn about metrics.
cntk.metrics
CNTK有一个名为 cntk.metrics 的包,我们可以从中导入我们准备使用的指标。当我们建立分类模型时,我们将使用 classification_error ,它将产生0到1之间的数字。0到1之间的数字表示正确预测样本的百分比−
CNTK has the package named cntk.metrics from which we can import the metrics we are going to use. As we are building a classification model, we will be using classification_error matric that will produce a number between 0 and 1. The number between 0 and 1 indicates the percentage of samples correctly predicted −
首先,我们需要从 cntk.metrics 包中导入指标−
First, we need to import the metric from cntk.metrics package −
from cntk.metrics import classification_error
error_rate = classification_error(z, label)
以上函数实际上需要神经网络的输出和预期标签作为输入。
The above function actually needs the output of the NN and the expected label as input.