Microsoft Cognitive Toolkit 简明教程

CNTK - Sequence Classification

在本章中,我们将详细了解 CNTK 中的序列及其分类。


CNTK 的工作原理如下 tensor 。基本上,CNTK 输入、输出以及参数被组织为 tensors ,通常被认为是通用矩阵。每个张量具有 rank

  1. 0 级张量是一个标量。

  2. 秩1张量是向量。

  3. 秩2张量是矩阵。

这里,这些不同的维度称为 axes.

Static axes and Dynamic axes




为了更清楚地说明这一点,让我们看看一批短视频剪辑如何在CNTK中表示。假设视频剪辑的分辨率均为640 * 480。另外,剪辑采用彩色拍摄,通常用三个通道编码。它进一步意味着我们的迷你批处理具有以下属性-

  1. 三个分别为640、480和3的静态轴。

  2. 两个动态轴。视频的长度和迷你批处理轴。

这意味着如果一个小批量有16个视频,每个视频有240帧长,将被表示为 16*240*3*640*480 张量。

Working with sequences in CNTK


Long-Short Term Memory Network (LSTM)

long short term memory network


  1. 遗忘门 - 如其名称所暗示的,它告诉记忆单元忘记先前的值。记忆单元存储值,直到门即“遗忘门”告诉它忘记它们。

  2. 输入门 - 如名称所暗示的,它向单元格添加新内容。

  3. 输出门 - 如名称所暗示的,输出门决定何时将向量从单元格传递到下一个隐藏状态。


import sys
import os
from cntk import Trainer, Axis
from import MinibatchSource, CTFDeserializer, StreamDef, StreamDefs,\
from cntk.learners import sgd, learning_parameter_schedule_per_sample
from cntk import input_variable, cross_entropy_with_softmax, \
   classification_error, sequence
from cntk.logging import ProgressPrinter
from cntk.layers import Sequential, Embedding, Recurrence, LSTM, Dense
def create_reader(path, is_training, input_dim, label_dim):
   return MinibatchSource(CTFDeserializer(path, StreamDefs(
      features=StreamDef(field='x', shape=input_dim, is_sparse=True),
      labels=StreamDef(field='y', shape=label_dim, is_sparse=False)
   )), randomize=is_training,
   max_sweeps=INFINITELY_REPEAT if is_training else 1)
def LSTM_sequence_classifier_net(input, num_output_classes, embedding_dim,
LSTM_dim, cell_dim):
   lstm_classifier = Sequential([Embedding(embedding_dim),
      Recurrence(LSTM(LSTM_dim, cell_dim)),
return lstm_classifier(input)
def train_sequence_classifier():
   input_dim = 2000
   cell_dim = 25
   hidden_dim = 25
   embedding_dim = 50
   num_output_classes = 5
   features = sequence.input_variable(shape=input_dim, is_sparse=True)
   label = input_variable(num_output_classes)
   classifier_output = LSTM_sequence_classifier_net(
   features, num_output_classes, embedding_dim, hidden_dim, cell_dim)
   ce = cross_entropy_with_softmax(classifier_output, label)
   pe =      classification_error(classifier_output, label)
   rel_path = ("../../../Tests/EndToEndTests/Text/" +
   path = os.path.join(os.path.dirname(os.path.abspath(__file__)), rel_path)
   reader = create_reader(path, True, input_dim, num_output_classes)
input_map = {
   features: reader.streams.features,
   label: reader.streams.labels
lr_per_sample = learning_parameter_schedule_per_sample(0.0005)
progress_printer = ProgressPrinter(0)
trainer = Trainer(classifier_output, (ce, pe),
sgd(classifier_output.parameters, lr=lr_per_sample),progress_printer)
minibatch_size = 200
for i in range(255):
   mb = reader.next_minibatch(minibatch_size, input_map=input_map)
   evaluation_average = float(trainer.previous_minibatch_evaluation_average)
   loss_average = float(trainer.previous_minibatch_loss_average)
return evaluation_average, loss_average
if __name__ == '__main__':
   error, _ = train_sequence_classifier()
   print(" error: %f" % error)
average  since  average  since  examples
loss     last   metric   last
1.61    1.61    0.886     0.886     44
1.61     1.6    0.714     0.629    133
 1.6    1.59     0.56     0.448    316
1.57    1.55    0.479      0.41    682
1.53     1.5    0.464     0.449   1379
1.46     1.4    0.453     0.441   2813
1.37    1.28     0.45     0.447   5679
 1.3    1.23    0.448     0.447  11365

error: 0.333333
