Microsoft Cognitive Toolkit 简明教程
CNTK - Convolutional Neural Network
在本章中,让我们研究如何在 CNTK 中构建卷积神经网络 (CNN)。
In this chapter, let us study how to construct a Convolutional Neural Network (CNN) in CNTK.
Introduction
卷积神经网络 (CNN) 也由神经元构成,具有可学习的权重和偏差。这就是为什么以这种方式,它们类似于普通神经网络 (NN)。
Convolutional neural networks (CNNs) are also made up of neurons, that have learnable weights and biases. That’s why in this manner, they are like ordinary neural networks (NNs).
如果我们对普通 NN 的工作原理进行回顾,每个神经元会接收一个或多个输入,并获取加权和,它会通过一个激活函数,以生成最终输出。在这里,会出现一个问题,如果 CNN 和普通 NN 具有如此多的相似性,那么是什么让这两个网络彼此不同?
If we recall the working of ordinary NNs, every neuron receives one or more inputs, takes a weighted sum and it passed through an activation function to produce the final output. Here, the question arises that if CNNs and ordinary NNs have so many similarities then what makes these two networks different to each other?
它们的不同之处在于对输入数据和层类型的处理?输入数据的结构在普通神经网络中被忽略,并且在将其输入网络之前,所有数据都被转换成 1 维数组。
What makes them different is the treatment of input data and types of layers? The structure of input data is ignored in ordinary NN and all the data is converted into 1-D array before feeding it into the network.
但是,卷积神经网络架构可以考虑图像的二维结构,对其进行处理并允许其提取特定于图像的属性。此外,CNN 具有一个或多个卷积层和池化层,这是 CNN 的主要构建块。
But, Convolutional Neural Network architecture can consider the 2D structure of the images, process them and allow it to extract the properties that are specific to images. Moreover, CNNs have the advantage of having one or more Convolutional layers and pooling layer, which are the main building blocks of CNNs.
这些层后面跟着一个或多个完全连接的层,就像标准的多层神经网络中一样。因此,我们可以将 CNN 视为完全连接网络的一种特殊情况。
These layers are followed by one or more fully connected layers as in standard multilayer NNs. So, we can think of CNN, as a special case of fully connected networks.
Convolutional Neural Network (CNN) architecture
CNN 的架构基本上是一系列层,将 3 维(即图像卷的宽度、高度和深度)转换为 3 维输出卷。这里需要注意的一个重要之处是,当前层中的每个神经元都连接到前一层输出的一个小块 patch,这就像在输入图像上覆盖一个 N*N 滤波器。
The architecture of CNN is basically a list of layers that transforms the 3-dimensional, i.e. width, height and depth of image volume into a 3-dimensional output volume. One important point to note here is that, every neuron in the current layer is connected to a small patch of the output from the previous layer, which is like overlaying a N*N filter on the input image.
它使用 M 个滤波器,这些滤波器基本上是特征提取器,用于提取诸如边缘、角等特征。以下是用于构建卷积神经网络 (CNN) 的层* [INPUT-CONV-RELU-POOL-FC]*−
It uses M filters, which are basically feature extractors that extract features like edges, corner and so on. Following are the layers* [INPUT-CONV-RELU-POOL-FC]* that are used to construct Convolutional neural networks (CNNs)−
-
* INPUT*− As the name implies, this layer holds the raw pixel values. Raw pixel values mean the data of the image as it is. Example, INPUT [64×64×3] is a 3-channeled RGB image of width-64, height-64 and depth-3.
-
* CONV*− This layer is one of the building blocks of CNNs as most of the computation is done in this layer. Example - if we use 6 filters on the above mentioned INPUT [64×64×3], this may result in the volume [64×64×6].
-
RELU−Also called rectified linear unit layer, that applies an activation function to the output of previous layer. In other manner, a non-linearity would be added to the network by RELU.
-
* POOL*− This layer, i.e. Pooling layer is one other building block of CNNs. The main task of this layer is down-sampling, which means it operates independently on every slice of the input and resizes it spatially.
-
FC− It is called Fully Connected layer or more specifically the output layer. It is used to compute output class score and the resulting output is volume of the size 1*1*L where L is the number corresponding to class score.
下面的图表展示了 CNN 的典型架构−
The diagram below represents the typical architecture of CNNs−

Creating CNN structure
我们已经了解了 CNN 的架构和基础知识,现在我们准备使用 CNTK 构建卷积网络。在这里,我们将首先了解如何构建 CNN 的结构,然后我们将了解如何训练它的参数。
We have seen the architecture and the basics of CNN, now we are going to building convolutional network using CNTK. Here, we will first see how to put together the structure of the CNN and then we will look at how to train the parameters of it.
最后我们将了解,如何通过使用各种不同的层设置来更改其结构来改进神经网络。我们将使用 MNIST 图像数据集。
At last we’ll see, how we can improve the neural network by changing its structure with various different layer setups. We are going to use MNIST image dataset.
因此,首先让我们创建一个 CNN 结构。通常,当我们构建一个用于识别图像中模式的 CNN 时,我们进行以下操作:
So, first let’s create a CNN structure. Generally, when we build a CNN for recognizing patterns in images, we do the following−
-
We use a combination of convolution and pooling layers.
-
One or more hidden layer at the end of the network.
-
At last, we finish the network with a softmax layer for classification purpose.
借助以下步骤,我们可以构建网络结构:
With the help of following steps, we can build the network structure−
Step 1 - 首先,我们需要导入 CNN 所需的层。
Step 1− First, we need to import the required layers for CNN.
from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling
Step 2 − 接下里,我们需要导入 CNN 的激活函数。
Step 2− Next, we need to import the activation functions for CNN.
from cntk.ops import log_softmax, relu
Step 3 − 为稍后初始化卷积层,我们需要按如下方式导入 glorot_uniform_initializer :
Step 3− After that in order to initialize the convolutional layers later, we need to import the glorot_uniform_initializer as follows−
from cntk.initializer import glorot_uniform
Step 4 − 接下里,为了创建输入变量,导入 input_variable 函数。并导入 default_option 函数,以使 NN 配置更加简单。
Step 4− Next, to create input variables import the input_variable function. And import default_option function, to make configuration of NN a bit easier.
from cntk import input_variable, default_options
Step 5 − 现要存储输入图像,创建一个新的 input_variable 。它包含三个通道,分别是红色、绿色和蓝色。它的大小为 28 x 28 像素。
Step 5− Now to store the input images, create a new input_variable. It will contain three channels namely red, green and blue. It would have the size of 28 by 28 pixels.
features = input_variable((3,28,28))
Step 6 − 接下里,我们需要创建另一个 input_variable ,以存储要预测的标签。
Step 6−Next, we need to create another input_variable to store the labels to predict.
labels = input_variable(10)
Step 7 − 现在,我们需要为 NN 创建 default_option 。并需要将 glorot_uniform 用作初始化函数。
Step 7− Now, we need to create the default_option for the NN. And, we need to use the glorot_uniform as the initialization function.
with default_options(initialization=glorot_uniform, activation=relu):
Step 8 − 接下里,为了设置 NN 的结构,我们需要创建一个新的 Sequential 层集。
Step 8− Next, in order to set the structure of the NN, we need to create a new Sequential layer set.
Step 9 − 现在,我们需要在 Sequential 层集中添加一个 Convolutional2D 层,该层具有 filter_shape 为 5 和 strides 设置为 1 。此外,启用填充,以便对图像进行填充以保留原始尺寸。
Step 9− Now we need to add a Convolutional2D layer with a filter_shape of 5 and a strides setting of 1, within the Sequential layer set. Also, enable padding, so that the image is padded to retain the original dimensions.
model = Sequential([
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),
Step 10 − 现在是添加一个 MaxPooling 层的时候了,其中 filter_shape 为 2,而 strides 设置为 2,以将图像压缩一半。
Step 10− Now it’s time to add a MaxPooling layer with filter_shape of 2, and a strides setting of 2 to compress the image by half.
MaxPooling(filter_shape=(2,2), strides=(2,2)),
Step 11 − 现在,如我们在步骤 9 中所做的那样,我们需要添加另一个 Convolutional2D 层,其 filter_shape 为 5 和 strides 设置为 1,使用 16 个滤波器。此外,启用填充,以便保留前一池化层产生的图像大小。
Step 11− Now, as we did in step 9, we need to add another Convolutional2D layer with a filter_shape of 5 and a strides setting of 1, use 16 filters. Also, enable padding, so that, the size of the image produced by the previous pooling layer should be retained.
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),
Step 12 − 现在,如我们在步骤 10 中所做的那样,再添加一个 MaxPooling *layer with a *filter_shape 的 3 和 strides 设置为 3,以将图像缩小到三分之一。
Step 12− Now, as we did in step 10, add another MaxPooling *layer with a *filter_shape of 3 and a strides setting of 3 to reduce the image to a third.
MaxPooling(filter_shape=(3,3), strides=(3,3)),
Step 13 − 最后,添加一个密集层,该层具有十个神经元,以表示网络可以预测的 10 个可能的类别。为了将网络转换为分类模型,请使用 log_siftmax 激活函数。
Step 13− At last, add a Dense layer with ten neurons for the 10 possible classes, the network can predict. In order to turn the network into a classification model, use a log_siftmax activation function.
Dense(10, activation=log_softmax)
])
Complete Example for creating CNN structure
from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling
from cntk.ops import log_softmax, relu
from cntk.initializer import glorot_uniform
from cntk import input_variable, default_options
features = input_variable((3,28,28))
labels = input_variable(10)
with default_options(initialization=glorot_uniform, activation=relu):
model = Sequential([
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),
MaxPooling(filter_shape=(2,2), strides=(2,2)),
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),
MaxPooling(filter_shape=(3,3), strides=(3,3)),
Dense(10, activation=log_softmax)
])
z = model(features)
Training CNN with images
当我们创建了网络结构后,就该对该网络进行训练了。但是在开始训练我们的网络之前,我们需要设置最小批次源,这是因为使用图像的 NN 训练需要比大多数计算机拥有的更多的内存。
As we have created the structure of the network, it’s time to train the network. But before starting the training of our network, we need to set up minibatch sources, because training a NN that works with images requires more memory, than most computers have.
我们在前几节中已经创建了最小批次源。以下是设置两个最小批次源的 Python 代码:
We have already created minibatch sources in previous sections. Following is the Python code to set up two minibatch sources −
当我们具有 create_datasource 函数时,我们现在可以创建两个独立的数据源(一个训练和一个测试)来训练模型。
As we have the create_datasource function, we can now create two separate data sources (training and testing one) to train the model.
train_datasource = create_datasource('mnist_train')
test_datasource = create_datasource('mnist_test', max_sweeps=1, train=False)
现在,当我们准备好了图像,我们就可以开始训练我们的 NN 了。正如我们在前几节中所做的那样,我们可以对损失函数使用训练方法来启动训练。以下是此代码:
Now, as we have prepared the images, we can start training of our NN. As we did in previous sections, we can use the train method on the loss function to kick off the training. Following is the code for this −
from cntk import Function
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.learners import sgd
@Function
def criterion_factory(output, targets):
loss = cross_entropy_with_softmax(output, targets)
metric = classification_error(output, targets)
return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, lr=0.2)
借助之前的代码,我们已经为 NN 设置了损失和学习者。以下代码将训练和验证 NN:
With the help of previous code, we have setup the loss and learner for the NN. The following code will train and validate the NN−
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
features: train_datasource.streams.features,
labels: train_datasource.streams.labels
}
loss.train(train_datasource,
max_epochs=10,
minibatch_size=64,
epoch_size=60000,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer, test_config])
Complete Implementation Example
from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling
from cntk.ops import log_softmax, relu
from cntk.initializer import glorot_uniform
from cntk import input_variable, default_options
features = input_variable((3,28,28))
labels = input_variable(10)
with default_options(initialization=glorot_uniform, activation=relu):
model = Sequential([
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),
MaxPooling(filter_shape=(2,2), strides=(2,2)),
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),
MaxPooling(filter_shape=(3,3), strides=(3,3)),
Dense(10, activation=log_softmax)
])
z = model(features)
import os
from cntk.io import MinibatchSource, StreamDef, StreamDefs, ImageDeserializer, INFINITELY_REPEAT
import cntk.io.transforms as xforms
def create_datasource(folder, train=True, max_sweeps=INFINITELY_REPEAT):
mapping_file = os.path.join(folder, 'mapping.bin')
image_transforms = []
if train:
image_transforms += [
xforms.crop(crop_type='randomside', side_ratio=0.8),
xforms.scale(width=28, height=28, channels=3, interpolations='linear')
]
stream_definitions = StreamDefs(
features=StreamDef(field='image', transforms=image_transforms),
labels=StreamDef(field='label', shape=10)
)
deserializer = ImageDeserializer(mapping_file, stream_definitions)
return MinibatchSource(deserializer, max_sweeps=max_sweeps)
train_datasource = create_datasource('mnist_train')
test_datasource = create_datasource('mnist_test', max_sweeps=1, train=False)
from cntk import Function
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.learners import sgd
@Function
def criterion_factory(output, targets):
loss = cross_entropy_with_softmax(output, targets)
metric = classification_error(output, targets)
return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, lr=0.2)
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
features: train_datasource.streams.features,
labels: train_datasource.streams.labels
}
loss.train(train_datasource,
max_epochs=10,
minibatch_size=64,
epoch_size=60000,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer, test_config])
Image transformations
正如我们所看到的,训练用于图像识别的 NN 非常困难,并且它们还需要大量的数据进行训练。另一个问题是,它们往往在训练时使用的图像上过度拟合。让我们通过一个例子来说明,当我们有处于直立位置的面部照片时,我们的模型将难以识别向其他方向旋转的面部。
As we have seen, it’s difficult to train NN used for image recognition and, they require a lot of data to train also. One more issue is that, they tend to overfit on images used during training. Let us see with an example, when we have photos of faces in an upright position, our model will have a hard time recognizing faces that are rotated in another direction.
为了克服此类问题,我们可以使用图像增强,而 CNTK 在为图像创建最小批次源时支持特定变换。我们可以使用如下转换:
In order to overcome such problem, we can use image augmentation and CNTK supports specific transforms, when creating minibatch sources for images. We can use several transformations as follows−
-
We can randomly crop images used for training with just a few lines of code.
-
We can use a scale and color also.
让我们在以下 Python 代码的帮助下看看我们如何通过在用于创建小批量源的函数中包含裁剪变换来更改变换列表。
Let’s see with the help of following Python code, how we can change the list of transformations by including a cropping transformation within the function used to create the minibatch source earlier.
import os
from cntk.io import MinibatchSource, StreamDef, StreamDefs, ImageDeserializer, INFINITELY_REPEAT
import cntk.io.transforms as xforms
def create_datasource(folder, train=True, max_sweeps=INFINITELY_REPEAT):
mapping_file = os.path.join(folder, 'mapping.bin')
image_transforms = []
if train:
image_transforms += [
xforms.crop(crop_type='randomside', side_ratio=0.8),
xforms.scale(width=28, height=28, channels=3, interpolations='linear')
]
stream_definitions = StreamDefs(
features=StreamDef(field='image', transforms=image_transforms),
labels=StreamDef(field='label', shape=10)
)
deserializer = ImageDeserializer(mapping_file, stream_definitions)
return MinibatchSource(deserializer, max_sweeps=max_sweeps)
借助上面的代码,我们可以增强函数来包括一组图像变换,以便在训练时可以随机裁剪图像,这样我们可以获得更多图像变化。
With the help of above code, we can enhance the function to include a set of image transforms, so that, when we will be training we can randomly crop the image, so we get more variations of the image.