Microsoft Cognitive Toolkit 简明教程
CNTK - Recurrent Neural Network
现在,让我们了解如何在 CNTK 中构建循环神经网络 (RNN)。
Now, let us understand how to construct a Recurrent Neural Network (RNN) in CNTK.
Introduction
我们学习了如何使用神经网络对图像进行分类,这是深度学习中的标志性任务之一。但是,神经网络另一个出色且正在进行大量研究的领域是循环神经网络 (RNN)。在这里,我们将了解什么是 RNN 以及如何在需要处理时间序列数据的情况下使用它。
We learned how to classify images with a neural network, and it is one of the iconic jobs in deep learning. But, another area where neural network excels at and lot of research happening is Recurrent Neural Networks (RNN). Here, we are going to know what RNN is and how it can be used in scenarios where we need to deal with time-series data.
What is Recurrent Neural Network?
循环神经网络 (RNN) 可以定义为能够随时间推理的特殊类型的 NN。RNN 主要用于需要处理随时间变化的值(即时间序列数据)的情况。为了更好地理解它,让我们在常规神经网络和循环神经网络之间进行一个小比较 -
Recurrent neural networks (RNNs) may be defined as the special breed of NNs that are capable of reasoning over time. RNNs are mainly used in scenarios, where we need to deal with values that change over time, i.e. time-series data. In order to understand it in a better way, let’s have a small comparison between regular neural networks and recurrent neural networks −
-
As we know that, in a regular neural network, we can provide only one input. This limits it to results in only one prediction. To give you an example, we can do translating text job by using regular neural networks.
-
On the other hand, in recurrent neural networks, we can provide a sequence of samples that result in a single prediction. In other words, using RNNs we can predict an output sequence based on an input sequence. For example, there have been quite a few successful experiments with RNN in translation tasks.
Uses of Recurrent Neural Network
RNN 可以以多种方式使用。其中一些如下 -
RNNs can be used in several ways. Some of them are as follows −
Predicting a single output
在深入了解 RNN 如何基于序列预测单个输出的步骤之前,让我们看看一个基本的 RNN 是什么样子的 -
Before getting deep dive into the steps, that how RNN can predict a single output based on a sequence, let’s see how a basic RNN looks like−

正如我们在上面的图表中看到的那样,RNN 包含到输入的环回连接,并且每当我们输入一系列的值时,它都会将序列中的每个元素作为时间步长进行处理。
As we can in the above diagram, RNN contains a loopback connection to the input and whenever, we feed a sequence of values it will process each element in the sequence as time steps.
此外,由于环回连接,RNN 可以将生成的输出与序列中下一个元素的输入相结合。通过这种方式,RNN 将在整个序列中构建一个内存,可用于进行预测。
Moreover, because of the loopback connection, RNN can combine the generated output with input for the next element in the sequence. In this way, RNN will build a memory over the whole sequence which can be used to make a prediction.
为了使用 RNN 进行预测,我们可以执行以下步骤 -
In order to make prediction with RNN, we can perform the following steps−
-
First, to create an initial hidden state, we need to feed the first element of the input sequence.
-
After that, to produce an updated hidden state, we need to take the initial hidden state and combine it with the second element in the input sequence.
-
At last, to produce the final hidden state and to predict the output for the RNN, we need to take the final element in the input sequence.
通过这种方式,借助这个环回连接,我们可以教 RNN 识别随时间发生的模式。
In this way, with the help of this loopback connection we can teach a RNN to recognize patterns that happen over time.
Predicting a sequence
上面讨论的基本 RNN 模型还可以扩展到其他用例。例如,我们可以使用它基于单个输入预测一系列值。在这种情况下,为了使用 RNN 进行预测,我们可以执行以下步骤 -
The basic model, discussed above, of RNN can be extended to other use cases as well. For example, we can use it to predict a sequence of values based on a single input. In this scenario, order to make prediction with RNN we can perform the following steps −
-
First, to create an initial hidden state and predict the first element in the output sequence, we need to feed an input sample into the neural network.
-
After that, to produce an updated hidden state and the second element in the output sequence, we need to combine the initial hidden state with the same sample.
-
At last, to update the hidden state one more time and predict the final element in output sequence, we feed the sample another time.
Predicting sequences
我们已经了解了如何基于序列预测一个值以及如何基于单个值预测序列。现在我们来看一看我们如何为序列预测序列。在这种情况下,为了利用 RNN 做出预测,我们可以执行以下步骤:
As we have seen how to predict a single value based on a sequence and how to predict a sequence based on a single value. Now let’s see how we can predict sequences for sequences. In this scenario, order to make prediction with RNN we can perform the following steps −
-
First, to create an initial hidden state and predict the first element in the output sequence, we need to take the first element in the input sequence.
-
After that, to update the hidden state and predict the second element in the output sequence, we need to take the initial hidden state.
-
At last, to predict the final element in the output sequence, we need to take the updated hidden state and the final element in the input sequence.
Working of RNN
为了理解循环神经网络 (RNN) 的工作原理,我们需要首先了解网络中的循环层的运作机制。因此,我们首先讨论一下如何利用标准循环层预测输出。
To understand the working of recurrent neural networks (RNNs) we need to first understand how recurrent layers in the network work. So first let’s discuss how e can predict the output with a standard recurrent layer.
Predicting output with standard RNN layer
就像我们之前讨论过的那样,RNN 中的一个基本层与神经网络中的一个常规层非常不同。在上一个部分,我们还在图表中演示了 RNN 的基本架构。为了首次更新顺序中步骤的隐藏状态,我们可以使用以下公式:
As we discussed earlier also that a basic layer in RNN is quite different from a regular layer in a neural network. In previous section, we also demonstrated in the diagram the basic architecture of RNN. In order to update the hidden state for the first-time step-in sequence we can use the following formula −

在上一个等式中,我们通过计算初始隐藏状态和一组权重之间的点积来计算新的隐藏状态。
In the above equation, we calculate the new hidden state by calculating the dot product between the initial hidden state and a set of weights.
现在,对于下一步,当前时间步骤的隐藏状态被用作顺序中下一步的初始隐藏状态。正因为如此,为了第二次更新时间步骤的隐藏状态,我们可以重复第一次执行的计算,如下所示:
Now for the next step, the hidden state for the current time step is used as the initial hidden state for the next time step in the sequence. That’s why, to update the hidden state for the second time step, we can repeat the calculations performed in the first-time step as follows −

接下来,我们可以重复针对第三步和顺序中的最后一步更新隐藏状态的过程,如下所示:
Next, we can repeat the process of updating the hidden state for the third and final step in the sequence as below −

当我们在序列中处理完以上所有步骤后,我们可以按照如下计算输出值:
And when we have processed all the above steps in the sequence, we can calculate the output as follows −

对于以上公式,我们使用了第三组权重和最终时间步骤隐藏状态。
For the above formula, we have used a third set of weights and the hidden state from the final time step.
Advanced Recurrent Units
基本循环层的主要问题是梯度消失问题,并且由此导致它不善于学习长期关联。简而言之,基本循环层不是很好地处理较长的序列。因此,以下其他一些循环层类型更适合于处理更长的序列:
The main issue with basic recurrent layer is of vanishing gradient problem and due to this it is not very good at learning long-term correlations. In simple words basic recurrent layer does not handle long sequences very well. That’s the reason some other recurrent layer types that are much more suited for working with longer sequences are as follows −
Long-Short Term Memory (LSTM)

Hochreiter和Schmidhuber提出了长短期记忆(LSTMs)网络。它解决了让基本递归层长时间记住事物的问题。LSTM的架构在图中以上给出。正如我们所看到的,它具有输入神经元、记忆细胞和输出神经元。为了解决梯度消失问题,长短期记忆网络使用显式记忆单元(存储先前的值)和以下门-
Long-short term memory (LSTMs) networks were introduced by Hochreiter & Schmidhuber. It solved the problem of getting a basic recurrent layer to remember things for a long time. The architecture of LSTM is given above in the diagram. As we can see it has input neurons, memory cells, and output neurons. In order to combat the vanishing gradient problem, Long-short term memory networks use an explicit memory cell (stores the previous values) and the following gates −
-
* Forget gate*− As name implies, it tells the memory cell to forget the previous values. The memory cell stores the values until the gate i.e. ‘forget gate’ tells it to forget them.
-
* Input gate*− As name implies, it adds new stuff to the cell.
-
* Output gate*− As name implies, output gate decides when to pass along the vectors from the cell to the next hidden state.
Gated Recurrent Units (GRUs)

Gradient recurrent units (GRU)是 LSTM 网络的一个小变种。它的门少一个并且连接方式与 LSTM 略有不同。其架构在上图中显示。它具有输入神经元、门控存储单元和输出神经元。门控循环单元网络具有以下两个门:
Gradient recurrent units (GRUs) is a slight variation of LSTMs network. It has one less gate and are wired slightly different than LSTMs. Its architecture is shown in the above diagram. It has input neurons, gated memory cells, and output neurons. Gated Recurrent Units network has the following two gates −
-
Update gate− It determines the following two things−
-
* Reset gate*− The functionality of reset gate is much like that of forget gate of LSTMs network. The only difference is that it is located slightly differently.
与长短期记忆网络相比,门控循环单元网络的速度稍快,并且运行起来也更容易。
In contrast to Long-short term memory network, Gated Recurrent Unit networks are slightly faster and easier to run.
Creating RNN structure
在开始预测任何数据源的输出之前,我们需要先构建 RNN,而构建 RNN 与我们在上一节中构建常规神经网络非常相似。下面是构建一个 RNN 的代码−
Before we can start, making prediction about the output from any of our data source, we need to first construct RNN and constructing RNN is quite same as we had build regular neural network in previous section. Following is the code to build one−
from cntk.losses import squared_error
from cntk.io import CTFDeserializer, MinibatchSource, INFINITELY_REPEAT, StreamDefs, StreamDef
from cntk.learners import adam
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
BATCH_SIZE = 14 * 10
EPOCH_SIZE = 12434
EPOCHS = 10
Staking multiple layers
我们还可以在 CNTK 中堆叠多个循环层。例如,我们可以使用以下层组合−
We can also stack multiple recurrent layers in CNTK. For example, we can use the following combination of layers−
from cntk import sequence, default_options, input_variable
from cntk.layers import Recurrence, LSTM, Dropout, Dense, Sequential, Fold
features = sequence.input_variable(1)
with default_options(initial_state = 0.1):
model = Sequential([
Fold(LSTM(15)),
Dense(1)
])(features)
target = input_variable(1, dynamic_axes=model.dynamic_axes)
如我们在上面的代码中看到的,我们有以下两种方法可以在 CNTK 中对 RNN 进行建模−
As we can see in the above code, we have the following two ways in which we can model RNN in CNTK −
-
First, if we only want the final output of a recurrent layer, we can use the Fold layer in combination with a recurrent layer, such as GRU, LSTM, or even RNNStep.
-
Second, as an alternative way, we can also use the Recurrence block.
Training RNN with time series data
在构建模型后,让我们看看如何在 CNTK 中训练 RNN −
Once we build the model, let’s see how we can train RNN in CNTK −
from cntk import Function
@Function
def criterion_factory(z, t):
loss = squared_error(z, t)
metric = squared_error(z, t)
return loss, metric
loss = criterion_factory(model, target)
learner = adam(model.parameters, lr=0.005, momentum=0.9)
现在要将数据加载到训练过程中,我们必须从一组 CTF 文件中反序列化序列。以下代码有 create_datasource 函数,它是一个有用的实用函数,可用于创建训练和测试数据源。
Now to load the data into the training process, we must have to deserialize sequences from a set of CTF files. Following code have the create_datasource function, which is a useful utility function to create both the training and test datasource.
target_stream = StreamDef(field='target', shape=1, is_sparse=False)
features_stream = StreamDef(field='features', shape=1, is_sparse=False)
deserializer = CTFDeserializer(filename, StreamDefs(features=features_stream, target=target_stream))
datasource = MinibatchSource(deserializer, randomize=True, max_sweeps=sweeps)
return datasource
train_datasource = create_datasource('Training data filename.ctf')#we need to provide the location of training file we created from our dataset.
test_datasource = create_datasource('Test filename.ctf', sweeps=1) #we need to provide the location of testing file we created from our dataset.
现在,当我们设置好数据源、模型和损失函数时,就可以开始训练过程了。这与我们在上一节中使用基本神经网络时所做的事情非常相似。
Now, as we have setup the data sources, model and the loss function, we can start the training process. It is quite similar as we did in previous sections with basic neural networks.
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
features: train_datasource.streams.features,
target: train_datasource.streams.target
}
history = loss.train(
train_datasource,
epoch_size=EPOCH_SIZE,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer, test_config],
minibatch_size=BATCH_SIZE,
max_epochs=EPOCHS
)
我们将获得如下所示的输出:
We will get the output similar as follows −
Validating the model
实际上,使用 RNN 进行预测与使用任何其他 CNK 模型进行预测非常相似。唯一的区别是,我们需要提供序列而不是单个样本。
Actually redicting with a RNN is quite similar to making predictions with any other CNK model. The only difference is that, we need to provide sequences rather than single samples.
现在,由于我们的 RNN 终于完成了训练,我们可以使用一些样本序列进行测试,从而验证模型,如下所示−
Now, as our RNN is finally done with training, we can validate the model by testing it using a few samples sequence as follows −
import pickle
with open('test_samples.pkl', 'rb') as test_file:
test_samples = pickle.load(test_file)
model(test_samples) * NORMALIZE