Microsoft Cognitive Toolkit 简明教程

Microsoft Cognitive Toolkit - Quick Guide

Microsoft Cognitive Toolkit (CNTK) - Introduction

在本章中,我们将了解什么是 CNTK,它的特点,其 1.0 和 2.0 版本之间的区别,以及 2.7 版本的重要亮点。

In this chapter, we will learn what is CNTK, its features, difference between its version 1.0 and 2.0 and important highlights of version 2.7.

What is Microsoft Cognitive Toolkit (CNTK)?

Microsoft Cognitive Toolkit (CNTK),以前称为 Computational Network Toolkit,是一个免费、易于使用、开源、商业级别的工具包,使我们能够训练深度学习算法,以像人脑一样学习。它使我们能够创建一些流行的深度学习系统,如 feed-forward neural network time series prediction systems and Convolutional neural network (CNN) image classifiers

Microsoft Cognitive Toolkit (CNTK), formerly known as Computational Network Toolkit, is a free, easy-to-use, open-source, commercial-grade toolkit that enables us to train deep learning algorithms to learn like the human brain. It enables us to create some popular deep learning systems like feed-forward neural network time series prediction systems and Convolutional neural network (CNN) image classifiers.

为了获得最佳性能,其框架功能以 C 语言编写。虽然可以使用 C 调用其功能,但最常用的方法是使用 Python 程序。

For optimal performance, its framework functions are written in C. Although we can call its function using C, but the most commonly used approach for the same is to use a Python program.

CNTK’s Features

以下是 Microsoft CNTK 最新版本中提供的一些功能:

Following are some of the features and capabilities offered in the latest version of Microsoft CNTK:

Built-in components

  1. CNTK has highly optimised built-in components that can handle multi-dimensional dense or sparse data from Python, C++ or BrainScript.

  2. We can implement CNN, FNN, RNN, Batch Normalisation and Sequence-to-Sequence with attention.

  3. It provides us the functionality to add new user-defined core-components on the GPU from Python.

  4. It also provides automatic hyperparameter tuning.

  5. We can implement Reinforcement learning, Generative Adversarial Networks (GANs), Supervised as well as Unsupervised learning.

  6. For massive datasets, CNTK has built-in optimised readers.

Usage of resources efficiently

  1. CNTK provides us parallelism with high accuracy on multiple GPUs/machines via 1-bit SGD.

  2. To fit the largest models in GPU memory, it provides memory sharing and other built-in methods.

Express our own networks easily

  1. CNTK has full APIs for defining your own network, learners, readers, training and evaluation from Python, C++, and BrainScript.

  2. Using CNTK, we can easily evaluate models with Python, C++, C# or BrainScript.

  3. It provides both high-level as well as low-level APIs.

  4. Based on our data, it can automatically shape the inference.

  5. It has fully optimised symbolic Recurrent Neural Network (RNN) loops.

Measuring model performance

  1. CNTK provides various components to measure the performance of neural networks you build.

  2. Generates log data from your model and the associated optimiser, which we can use to monitor the training process.

Version 1.0 vs Version 2.0

下表比较了 CNTK V1.0 和 V2.0:

Following table compares CNTK Version 1.0 and 2.0:

Version 1.0

Version 2.0

It was released in 2016.

It is a significant rewrite of the 1.0 Version and was released in June 2017.

It used a proprietary scripting language called BrainScript.

Its framework functions can be called using C++, Python. We can easily load our modules in C# or Java. BrainScript is also supported by Version 2.0.

It runs on both Windows and Linux systems but not directly on Mac OS.

It also runs on both Windows (Win 8.1, Win 10, Server 2012 R2 and later) and Linux systems but not directly on Mac OS.

Important Highlights of Version 2.7

Version 2.7 是 Microsoft Cognitive Toolkit 的最新主版本。它完全支持 ONNX 1.4.1。以下是这个 CNTK 最新版本的几个重要要点。

Version 2.7 is the last main released version of Microsoft Cognitive Toolkit. It has full support for ONNX 1.4.1. Following are some important highlights of this last released version of CNTK.

  1. Full support for ONNX 1.4.1.

  2. Support for CUDA 10 for both Windows and Linux systems.

  3. It supports advance Recurrent Neural Networks (RNN) loop in ONNX export.

  4. It can export more than 2GB models in ONNX format.

  5. It supports FP16 in BrainScript scripting language’s training action.

Microsoft Cognitive Toolkit (CNTK) - Getting Started

在这里,我们将理解如何在 Windows 和 Linux 上安装 CNTK。此外,章节还解释了如何安装 CNTK 软件包、Anaconda 安装步骤、CNTK 文件、目录结构和 CNTK 库组织。

Here, we will understand about the installation of CNTK on Windows and on Linux. Moreover, the chapter explains installing CNTK package, steps to install Anaconda, CNTK files, directory structure and CNTK library organisation.

Prerequisites

为了安装 CNTK,我们的计算机必须已经安装 Python。您可以访问链接 https://www.python.org/downloads/ ,并选择您操作系统的最新版本,即 Windows 和 Linux/Unix。有关 Python 的基本教程,您可以参阅链接 [role="bare" [role="bare"]https://www.tutorialspoint.com/python3/index.htm

In order to install CNTK, we must have Python installed on our computers. You can go to the link https://www.python.org/downloads/ and select the latest version for your OS, i.e. Windows and Linux/Unix. For basic tutorial on Python, you can refer to the link [role="bare"https://www.tutorialspoint.com/python3/index.htm].

python downloads

CNTK 可在 Windows 和 Linux 上使用,因此我们将逐步引导您进行两项操作。

CNTK is supported for Windows as well as Linux so we will walk through both of them.

Installing on Windows

为了在 Windows 上运行 CNTK,我们将使用 Python 的 Anaconda version 。我们知道,Anaconda 是 Python 的再分发版本。它包含额外的软件包,例如 ScipyScikit-learn,CNTK 使用这些软件包来执行各种有用的计算。

In order to run CNTK on Windows, we will be using the Anaconda version of Python. We know that, Anaconda is a redistribution of Python. It includes additional packages like Scipy and*Scikit-learn* which are used by CNTK to perform various useful calculations.

因此,首先让我们了解在机器上安装 Anaconda 的步骤 −

So, first let see the steps to install Anaconda on your machine −

Step 1 − 首先从公共网站 https://www.anaconda.com/distribution/ 下载安装文件。

Step 1−First download the setup files from the public website https://www.anaconda.com/distribution/.

Step 2 − 下载安装文件后,启动安装并按照链接 https://docs.anaconda.com/anaconda/install/ 中的说明进行操作。

Step 2 − Once you downloaded the setup files, start the installation and follow the instructions from the link https://docs.anaconda.com/anaconda/install/.

Step 3 − 安装后,Anaconda 还会安装一些其他实用程序,其中将自动包含计算机路径变量中所有 Anaconda 可执行文件。我们可以从此提示符管理 Python 环境,并且可以安装软件包并运行 Python 脚本。

Step 3 − Once installed, Anaconda will also install some other utilities, which will automatically include all the Anaconda executables in your computer PATH variable. We can manage our Python environment from this prompt, can install packages and run Python scripts.

Installing CNTK package

完成 Anaconda 安装后,您可以使用最常通过 pip 可执行文件安装 CNTK 软件包的方法,方法是用以下命令 −

Once Anaconda installation is done, you can use the most common way to install the CNTK package through the pip executable by using following command −

pip install cntk

机器上有多种其他方法来安装认知工具包。Microsoft 有一套清晰的文档,详细说明了其他安装方法。请访问以下链接 https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine

There are various other methods to install Cognitive Toolkit on your machine. Microsoft has a neat set of documentation that explains the other installation methods in detail. Please follow the link https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine.

Installing on Linux

在 Linux 上安装 CNTK 的过程与在 Windows 上安装的过程稍有不同。在这里,对于 Linux,我们将使用 Anaconda 来安装 CNTK,但对于 Anaconda 图形安装程序,我们将使用 Linux 上基于终端的安装程序。虽然安装程序适用于几乎所有的 Linux 发行版,但我们将描述限制在了 Ubuntu。

Installation of CNTK on Linux is a bit different from its installation on Windows. Here, for Linux we are going to use Anaconda to install CNTK, but instead of a graphical installer for Anaconda, we will be using a terminal-based installer on Linux. Although, the installer will work with almost all Linux distributions, we limited the description to Ubuntu.

因此,首先让我们了解在机器上安装 Anaconda 的步骤 −

So, first let see the steps to install Anaconda on your machine −

Steps to install Anaconda

Step 1 − 在安装 Anaconda 之前,请确定系统是最新的。首先在一个终端内执行以下两个命令来检查 −

Step 1 − Before installing Anaconda, make sure that the system is fully up to date. To check, first execute the following two commands inside a terminal −

sudo apt update
sudo apt upgrade

*步骤 2 * − 一旦计算机更新,从公共网站 https://www.anaconda.com/distribution/ 获取最新 Anaconda 安装文件的 URL。

*Step 2 * − Once the computer is updated, get the URL from the public website https://www.anaconda.com/distribution/ for the latest Anaconda installation files.

Step 3 − 复制 URL 后,打开一个终端窗口并执行以下命令 −

Step 3 − Once URL is copied, open a terminal window and execute the following command −

wget -0 anaconda-installer.sh url SHAPE \* MERGEFORMAT
     y



	             f


      x

|                     }

使用从 Anaconda 网站复制的 URL 替换 url 占位符。

Replace the url placeholder with the URL copied from the Anaconda website.

Step 4 − 接下来,我们可以使用以下命令帮助安装 Anaconda −

Step 4 − Next, with the help of following command, we can install Anaconda −

sh ./anaconda-installer.sh

上述命令默认情况下会在我们的主目录中安装 Anaconda3

The above command will by default install Anaconda3 inside our home directory.

Installing CNTK package

完成 Anaconda 安装后,您可以使用最常通过 pip 可执行文件安装 CNTK 软件包的方法,方法是用以下命令 −

Once Anaconda installation is done, you can use the most common way to install the CNTK package through the pip executable by using following command −

pip install cntk

Examining CNTK files & directory structure

一旦 CNTK 安装为 Python 软件包,便可以检查其文件和目录结构。它在 C:\Users\ \Anaconda3\Lib\site-packages\cntk, ,如下面的屏幕截图所示。

Once CNTK is installed as a Python package, we can examine its file and directory structure. It’s at C:\Users\ \Anaconda3\Lib\site-packages\cntk, as shown below in screenshot.

files and directory structure

Verifying CNTK installation

一旦 CNTK 安装为 Python 软件包,您应该验证 CNTK 是否已正确安装。从 Anaconda 命令行外壳进入,通过输入 ipython. 启动 Python 解释器。随后通过输入以下命令,导入 CNTK

Once CNTK is installed as a Python package, you should verify that CNTK has been installed correctly. From Anaconda command shell, start Python interpreter by entering ipython. Then, import CNTK by entering the following command.

import cntk as c

一旦导入后,在下列命令的帮助下检查其版本: −

Once imported, check its version with the help of following command −

print(c.__version__)

解释器将响应已安装的 CNTK 版本。如果它没有响应,那么安装显然出了问题。

The interpreter will respond with installed CNTK version. If it doesn’t respond, there will be a problem with the installation.

The CNTK library organisation

CNTK 是一个 Python 软件包,在技术上被组织成 13 个高级子软件包和 8 个较小的子软件包。下表由 10 个最频繁使用的软件包组成:

CNTK, a python package technically, is organised into 13 high-level sub-packages and 8 smaller sub-packages. Following table consist of the 10 most frequently used packages:

Sr.No

Package Name & Description

1

cntk.io Contains functions for reading data. For example: next_minibatch()

2

cntk.layers Contains high-level functions for creating neural networks. For example: Dense()

3

cntk.learners Contains functions for training. For example: sgd()

4

cntk.losses Contains functions to measure training error. For example: squared_error()

5

cntk.metrics Contains functions to measure model error. For example: classificatoin_error

6

cntk.ops Contains low-level functions for creating neural networks. For example: tanh()

7

cntk.random Contains functions to generate random numbers. For example: normal()

8

cntk.train Contains training functions. For example: train_minibatch()

9

cntk.initializer Contains model parameter initializers. For example: normal() and uniform()

10

cntk.variables Contains low-level constructs. For example: Parameter() and Variable()

Microsoft Cognitive Toolkit (CNTK) - CPU and GPU

Microsoft Cognitive Toolkit 提供了两种不同的构建版本,即仅限 CPU 和仅限 GPU。

Microsoft Cognitive Toolkit offers two different build versions namely CPU-only and GPU-only.

CPU only build version

仅限 CPU 的 CNTK 构建版本使用优化的 Intel MKLML,其中 MKLML 是 MKL(Math Kernel Library)的子集,并且随 Intel MKL-DNN 一起作为 Intel MKL 的终止版本发布,以供 MKL-DNN 使用。

The CPU-only build version of CNTK uses the optimised Intel MKLML, where MKLML is the subset of MKL (Math Kernel Library) and released with Intel MKL-DNN as a terminated version of Intel MKL for MKL-DNN.

GPU only build version

另一方面,仅限 GPU 的 CNTK 构建版本使用高度优化的 NVIDIA 库,例如 CUBcuDNN 。它支持跨多个 GPU 和多个机器进行分布式训练。为了让 CNTK 在分布式训练中运行得更快,GPU 构建版本还包括: −

On the other hand, the GPU-only build version of CNTK uses highly optimised NVIDIA libraries such as CUB and cuDNN. It supports distributed training across multiple GPUs and multiple machines. For even faster distributed training in CNTK, the GPU-build version also includes −

  1. MSR-developed 1bit-quantized SGD.

  2. Block-momentum SGD parallel training algorithms.

Enabling GPU with CNTK on Windows

在上一节中,我们了解了如何安装 CNTK 的基本版本以与 CPU 配合使用。现在让我们讨论如何安装 CNTK 以与 GPU 配合使用。但是,在深入了解之前,你首先应该拥有一个受支持的显卡。

In the previous section, we saw how to install the basic version of CNTK to use with the CPU. Now let’s discuss how we can install CNTK to use with a GPU. But, before getting deep dive into it, first you should have a supported graphics card.

目前,CNTK 支持至少支持 CUDA 3.0 的 NVIDIA 显卡。要确保支持,你可以通过 https://developer.nvidia.com/cuda-gpus 检查你的 GPU 是否支持 CUDA。

At present, CNTK supports the NVIDIA graphics card with at least CUDA 3.0 support. To make sure, you can check at https://developer.nvidia.com/cuda-gpus whether your GPU supports CUDA.

因此,让我们了解一下在 Windows 操作系统上使用 CNTK 启用 GPU 的步骤 −

So, let us see the steps to enable GPU with CNTK on Windows OS −

Step 1 − 根据你正在使用的显卡,首先你需要为你的显卡配备最新的 GeForce 或 Quadro 驱动程序。

Step 1 − Depending on the graphics card you are using, first you need to have the latest GeForce or Quadro drivers for your graphics card.

Step 2 − 下载完驱动程序后,你需要从 NVIDIA 网站 https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64 安装 Windows 版 CUDA 工具包版本 9.0。安装完毕后,运行安装程序并按照说明操作。

Step 2 − Once you downloaded the drivers, you need to install the CUDA toolkit Version 9.0 for Windows from NVIDIA website https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64. After installing, run the installer and follow the instructions.

Step 3 − 接下来,你需要从 NVIDIA 网站 https://developer.nvidia.com/rdp/form/cudnn-download-survey 安装 cuDNN 二进制文件。使用 CUDA 9.0 版本时,cuDNN 7.4.1 会运行良好。基本上,cuDNN 是 CUDA 之上的一个层,由 CNTK 使用。

Step 3 − Next, you need to install cuDNN binaries from NVIDIA website https://developer.nvidia.com/rdp/form/cudnn-download-survey. With CUDA 9.0 version, cuDNN 7.4.1 works well. Basically, cuDNN is a layer on the top of CUDA, used by CNTK.

Step 4 − 下载 cuDNN 二进制文件后,你需要将 zip 文件解压缩到 CUDA 工具包安装的根文件夹中。

Step 4 − After downloading the cuDNN binaries, you need to extract the zip file into the root folder of your CUDA toolkit installation.

Step 5 − 这是最后一步,它将在 CNTK 中启用 GPU 使用。在 Windows 操作系统上的 Anaconda 提示符中执行以下命令 −

Step 5 − This is the last step which will enable GPU usage inside CNTK. Execute the following command inside the Anaconda prompt on Windows OS −

pip install cntk-gpu

Enabling GPU with CNTK on Linux

让我们了解一下如何在 Linux 操作系统上使用 CNTK 启用 GPU −

Let us see how we can enable GPU with CNTK on Linux OS −

Downloading the CUDA toolkit

首先,你需要从 NVIDIA 网站链接安装 CUDA 工具包:https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal[[role="bare"] [role="bare"]https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type =runfilelocal]。

First, you need to install the CUDA toolkit from NVIDIA website link:https://developer.nvidia.com/cuda-90-download -archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal[[role="bare"]https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type =runfilelocal].

Running the installer

现在,完成二进制文件下载后,打开终端并执行以下命令来运行安装程序,然后按照屏幕上的说明操作 −

Now, once you have binaries on the disk, run the installer by opening a terminal and executing the following command and the instruction on screen −

sh cuda_9.0.176_384.81_linux-run

Modify Bash profile script

在 Linux 电脑上安装 CUDA 工具包后,你需要修改 BASH 个人资料脚本。为此,首先在文本编辑器中打开 $HOME/.bashrc 文件。现在,在脚本的末尾,包含以下代码行 −

After installing CUDA toolkit on your Linux machine, you need to modify the BASH profile script. For this, first open the $HOME/ .bashrc file in text editor. Now, at the end of the script, include the following lines −

export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Installing

Installing cuDNN libraries

最后,我们需要安装 cuDNN 二进制文件。可以从 NVIDIA 网站 https://developer.nvidia.com/rdp/form/cudnn-download-survey 下载。使用 CUDA 9.0 版本时,cuDNN 7.4.1 会运行良好。基本上,cuDNN 是 CUDA 之上的一个层,由 CNTK 使用。

At last we need to install cuDNN binaries. It can be downloaded from NVIDIA website https://developer.nvidia.com/rdp/form/cudnn-download-survey. With CUDA 9.0 version, cuDNN 7.4.1 works well. Basically, cuDNN is a layer on the top of CUDA, used by CNTK.

下载 Linux 版本后,使用以下命令将其解压到 /usr/local/cuda-9.0 文件夹 −

Once downloaded the version for Linux, extract it to the /usr/local/cuda-9.0 folder by using the following command −

tar xvzf -C /usr/local/cuda-9.0/ cudnn-9.0-linux-x64-v7.4.1.5.tgz

根据需要更改路径到文件名。

Change the path to the filename as required.

CNTK - Sequence Classification

在本章中,我们将详细了解 CNTK 中的序列及其分类。

In this chapter, we will learn in detail about the sequences in CNTK and its classification.

Tensors

CNTK 的工作原理如下 tensor 。基本上,CNTK 输入、输出以及参数被组织为 tensors ,通常被认为是通用矩阵。每个张量具有 rank

The concept on which CNTK works is tensor. Basically, CNTK inputs, outputs as well as parameters are organized as tensors, which is often thought of as a generalised matrix. Every tensor has a rank

  1. Tensor of rank 0 is a scalar.

  2. Tensor of rank 1 is a vector.

  3. Tensor of rank 2 is amatrix.

这里,这些不同的维度称为 axes.

Here, these different dimensions are referred as axes.

Static axes and Dynamic axes

正如其名称所暗示的,静态轴在整个网络的生命周期中具有相同的长度。另一方面,动态轴的长度可以从一个实例到另一个实例有所不同。事实上,它们的长度通常在提供每个小批量之前未知。

As the name implies, the static axes have the same length throughout the network’s life. On the other hand, the length of dynamic axes can vary from instance to instance. In fact, their length is typically not known before each minibatch is presented.

动态轴与静态轴类似,因为它们还定义了张量中包含的数字的有意义的分组。

Dynamic axes are like static axes because they also define a meaningful grouping of the numbers contained in the tensor.

Example

为了更清楚地说明这一点,让我们看看一批短视频剪辑如何在CNTK中表示。假设视频剪辑的分辨率均为640 * 480。另外,剪辑采用彩色拍摄,通常用三个通道编码。它进一步意味着我们的迷你批处理具有以下属性-

To make it clearer, let’s see how a minibatch of short video clips is represented in CNTK. Suppose that the resolution of video clips is all 640 * 480. And, also the clips are shot in color which is typically encoded with three channels. It further means that our minibatch has the following −

  1. 3 static axes of length 640, 480 and 3 respectively.

  2. Two dynamic axes; the length of the video and the minibatch axes.

这意味着如果一个小批量有16个视频,每个视频有240帧长,将被表示为 16*240*3*640*480 张量。

It means that if a minibatch is having 16 videos each of which is 240 frames long, would be represented as 16*240*3*640*480 tensors.

Working with sequences in CNTK

让我们通过首先了解长短期记忆网络来了解CNTK中的序列。

Let us understand sequences in CNTK by first learning about Long-Short Term Memory Network.

Long-Short Term Memory Network (LSTM)

long short term memory network

Hochreiter和Schmidhuber提出了长短期记忆(LSTMs)网络。它解决了让基本递归层长时间记住事物的问题。LSTM的架构在图中以上给出。正如我们所看到的,它具有输入神经元、记忆细胞和输出神经元。为了解决梯度消失问题,长短期记忆网络使用显式记忆单元(存储先前的值)和以下门-

Long-short term memory (LSTMs) networks were introduced by Hochreiter & Schmidhuber. It solved the problem of getting a basic recurrent layer to remember things for a long time. The architecture of LSTM is given above in the diagram. As we can see it has input neurons, memory cells, and output neurons. In order to combat the vanishing gradient problem, Long-short term memory networks use an explicit memory cell (stores the previous values) and the following gates −

  1. * Forget gate* − As the name implies, it tells the memory cell to forget the previous values. The memory cell stores the values until the gate i.e. ‘forget gate’ tells it to forget them.

  2. * Input gate* − As name implies, it adds new stuff to the cell.

  3. * Output gate* − As name implies, output gate decides when to pass along the vectors from the cell to the next hidden state.

在CNTK中使用序列非常容易。让我们借助以下示例来了解它-

It is very easy to work with sequences in CNTK. Let’s see it with the help of following example −

import sys
import os
from cntk import Trainer, Axis
from cntk.io import MinibatchSource, CTFDeserializer, StreamDef, StreamDefs,\
   INFINITELY_REPEAT
from cntk.learners import sgd, learning_parameter_schedule_per_sample
from cntk import input_variable, cross_entropy_with_softmax, \
   classification_error, sequence
from cntk.logging import ProgressPrinter
from cntk.layers import Sequential, Embedding, Recurrence, LSTM, Dense
def create_reader(path, is_training, input_dim, label_dim):
   return MinibatchSource(CTFDeserializer(path, StreamDefs(
      features=StreamDef(field='x', shape=input_dim, is_sparse=True),
      labels=StreamDef(field='y', shape=label_dim, is_sparse=False)
   )), randomize=is_training,
   max_sweeps=INFINITELY_REPEAT if is_training else 1)
def LSTM_sequence_classifier_net(input, num_output_classes, embedding_dim,
LSTM_dim, cell_dim):
   lstm_classifier = Sequential([Embedding(embedding_dim),
      Recurrence(LSTM(LSTM_dim, cell_dim)),
      sequence.last,
      Dense(num_output_classes)])
return lstm_classifier(input)
def train_sequence_classifier():
   input_dim = 2000
   cell_dim = 25
   hidden_dim = 25
   embedding_dim = 50
   num_output_classes = 5
   features = sequence.input_variable(shape=input_dim, is_sparse=True)
   label = input_variable(num_output_classes)
   classifier_output = LSTM_sequence_classifier_net(
   features, num_output_classes, embedding_dim, hidden_dim, cell_dim)
   ce = cross_entropy_with_softmax(classifier_output, label)
   pe =      classification_error(classifier_output, label)
   rel_path = ("../../../Tests/EndToEndTests/Text/" +
      "SequenceClassification/Data/Train.ctf")
   path = os.path.join(os.path.dirname(os.path.abspath(__file__)), rel_path)
   reader = create_reader(path, True, input_dim, num_output_classes)
input_map = {
   features: reader.streams.features,
   label: reader.streams.labels
}
lr_per_sample = learning_parameter_schedule_per_sample(0.0005)
progress_printer = ProgressPrinter(0)
trainer = Trainer(classifier_output, (ce, pe),
sgd(classifier_output.parameters, lr=lr_per_sample),progress_printer)
minibatch_size = 200
for i in range(255):
   mb = reader.next_minibatch(minibatch_size, input_map=input_map)
trainer.train_minibatch(mb)
   evaluation_average = float(trainer.previous_minibatch_evaluation_average)
   loss_average = float(trainer.previous_minibatch_loss_average)
return evaluation_average, loss_average
if __name__ == '__main__':
   error, _ = train_sequence_classifier()
   print(" error: %f" % error)
average  since  average  since  examples
loss     last   metric   last
------------------------------------------------------
1.61    1.61    0.886     0.886     44
1.61     1.6    0.714     0.629    133
 1.6    1.59     0.56     0.448    316
1.57    1.55    0.479      0.41    682
1.53     1.5    0.464     0.449   1379
1.46     1.4    0.453     0.441   2813
1.37    1.28     0.45     0.447   5679
 1.3    1.23    0.448     0.447  11365

error: 0.333333

下一节将对上述程序的详细解释进行介绍,特别是当我们将构建循环神经网络时。

The detailed explanation of the above program will be covered in next sections, especially when we will be constructing Recurrent Neural networks.

CNTK - Logistic Regression Model

本章讨论如何在CNTK中构建逻辑回归模型。

This chapter deals with constructing a logistic regression model in CNTK.

Basics of Logistic Regression model

逻辑回归是最简单的机器学习技术之一,特别用于二分类技术。换而言之,在变量值可以是两个分类值之一的情况下创建预测模型。逻辑回归最简单的例子之一是根据一个人的年龄、声音、头发等预测一个人是男性还是女性。

Logistic Regression, one of the simplest ML techniques, is a technique especially for binary classification. In other words, to create a prediction model in situations where the value of the variable to predict can be one of just two categorical values. One of the simplest examples of Logistic Regression is to predict whether the person is male or female, based on person’s age, voice, hairs and so on.

Example

让我们借助另一个示例从数学角度了解逻辑回归的概念-

Let’s understand the concept of Logistic Regression mathematically with the help of another example −

假设我们想根据申请人 debt , incomecredit rating 预测贷款申请的信用价值;0 表示拒绝,1 表示批准。我们使用 X1 表示债务,使用 X2 表示收入,使用 X3 表示信用评级。

Suppose, we want to predict the credit worthiness of a loan application; 0 means reject, and 1 means approve, based on applicant debt , income and credit rating. We represent debt with X1, income with X2 and credit rating with X3.

在逻辑回归中,我们为每个特征确定一个重量值(由 w 表示),并为每个特征确定一个单个偏差值(由 b 表示)。

In Logistic Regression, we determine a weight value, represented by w, for every feature and a single bias value, represented by b.

现在假设

Now suppose,

X1 = 3.0
X2 = -2.0
X3 = 1.0

现在假设我们如下确定重量和偏差(bias)

And suppose we determine weight and bias as follows −

W1 = 0.65, W2 = 1.75, W3 = 2.05 and b = 0.33

现在,对于预测类别,我们需要应用以下公式

Now, for predicting the class, we need to apply the following formula −

Z = (X1*W1)+(X2*W2)+(X3+W3)+b
i.e. Z = (3.0)*(0.65) + (-2.0)*(1.75) + (1.0)*(2.05) + 0.33
= 0.83

接下来,我们需要计算 P = 1.0/(1.0 + exp(-Z)) 。这里,exp() 函数是欧拉数。

Next, we need to compute P = 1.0/(1.0 + exp(-Z)). Here, the exp() function is Euler’s number.

P = 1.0/(1.0 + exp(-0.83)
= 0.6963

P 值可以解释为类别为 1 的概率。如果 P < 0.5,则预测为类别 = 0,否则预测(P >= 0.5)为类别 = 1。

The P value can be interpreted as the probability that the class is 1. If P < 0.5, the prediction is class = 0 else the prediction (P >= 0.5) is class = 1.

要确定重量和偏差的值,我们必须获取一组训练数据,其中包含已知的输入预测变量值和已知的正确类别标签值。之后,我们可以使用一个算法(通常为梯度下降)来找到重量和偏差的值。

To determine the values of weight and bias, we must obtain a set of training data having the known input predictor values and known correct class labels values. After that, we can use an algorithm, generally Gradient Descent, in order to find the values of weight and bias.

LR model implementation example

对于此 LR 模型,我们将使用以下数据集

For this LR model, we are going to use the following data set −

1.0, 2.0, 0
3.0, 4.0, 0
5.0, 2.0, 0
6.0, 3.0, 0
8.0, 1.0, 0
9.0, 2.0, 0
1.0, 4.0, 1
2.0, 5.0, 1
4.0, 6.0, 1
6.0, 5.0, 1
7.0, 3.0, 1
8.0, 5.0, 1

要在 CNTK 中启动该 LR 模型实现,我们需要首先导入以下包

To start this LR model implementation in CNTK, we need to first import the following packages −

import numpy as np
import cntk as C

程序的结构采用 main() 函数,如下所示

The program is structured with main() function as follows −

def main():
print("Using CNTK version = " + str(C.__version__) + "\n")

现在,我们需要按照如下方式将训练数据加载到内存中

Now, we need to load the training data into memory as follows −

data_file = ".\\dataLRmodel.txt"
print("Loading data from " + data_file + "\n")
features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",", skiprows=0, usecols=[0,1])
labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",", skiprows=0, usecols=[2], ndmin=2)

现在,我们将创建一个训练程序,该程序将创建一个适合训练数据的逻辑回归模型

Now, we will be creating a training program that creates a logistic regression model which is compatible with the training data −

features_dim = 2
labels_dim = 1
X = C.ops.input_variable(features_dim, np.float32)
y = C.input_variable(labels_dim, np.float32)
W = C.parameter(shape=(features_dim, 1)) # trainable cntk.Parameter
b = C.parameter(shape=(labels_dim))
z = C.times(X, W) + b
p = 1.0 / (1.0 + C.exp(-z))
model = p

现在,我们需要按照如下方式创建 Lerner 和培训人员

Now, we need to create Lerner and trainer as follows −

ce_error = C.binary_cross_entropy(model, y) # CE a bit more principled for LR
fixed_lr = 0.010
learner = C.sgd(model.parameters, fixed_lr)
trainer = C.Trainer(model, (ce_error), [learner])
max_iterations = 4000

LR Model training

一旦我们创建了 LR 模型,接下来,就该开始训练过程了

Once, we have created the LR model, next, it is time to start the training process −

np.random.seed(4)
N = len(features_mat)
for i in range(0, max_iterations):
row = np.random.choice(N,1) # pick a random row from training items
trainer.train_minibatch({ X: features_mat[row], y: labels_mat[row] })
if i % 1000 == 0 and i > 0:
mcee = trainer.previous_minibatch_loss_average
print(str(i) + " Cross-entropy error on curr item = %0.4f " % mcee)

现在,借助以下代码,我们可以打印模型重量和偏差

Now, with the help of the following code, we can print the model weights and bias −

np.set_printoptions(precision=4, suppress=True)
print("Model weights: ")
print(W.value)
print("Model bias:")
print(b.value)
print("")
if __name__ == "__main__":
main()

Training a Logistic Regression model - Complete example

import numpy as np
import cntk as C
   def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
data_file = ".\\dataLRmodel.txt" # provide the name and the location of data file
print("Loading data from " + data_file + "\n")
features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",", skiprows=0, usecols=[0,1])
labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",", skiprows=0, usecols=[2], ndmin=2)
features_dim = 2
labels_dim = 1
X = C.ops.input_variable(features_dim, np.float32)
y = C.input_variable(labels_dim, np.float32)
W = C.parameter(shape=(features_dim, 1)) # trainable cntk.Parameter
b = C.parameter(shape=(labels_dim))
z = C.times(X, W) + b
p = 1.0 / (1.0 + C.exp(-z))
model = p
ce_error = C.binary_cross_entropy(model, y) # CE a bit more principled for LR
fixed_lr = 0.010
learner = C.sgd(model.parameters, fixed_lr)
trainer = C.Trainer(model, (ce_error), [learner])
max_iterations = 4000
np.random.seed(4)
N = len(features_mat)
for i in range(0, max_iterations):
row = np.random.choice(N,1) # pick a random row from training items
trainer.train_minibatch({ X: features_mat[row], y: labels_mat[row] })
if i % 1000 == 0 and i > 0:
mcee = trainer.previous_minibatch_loss_average
print(str(i) + " Cross-entropy error on curr item = %0.4f " % mcee)
np.set_printoptions(precision=4, suppress=True)
print("Model weights: ")
print(W.value)
print("Model bias:")
print(b.value)
if __name__ == "__main__":
  main()

Output

Using CNTK version = 2.7
1000 cross entropy error on curr item = 0.1941
2000 cross entropy error on curr item = 0.1746
3000 cross entropy error on curr item = 0.0563
Model weights:
[-0.2049]
   [0.9666]]
Model bias:
[-2.2846]

Prediction using trained LR Model

一旦训练了 LR 模型,我们就可以按照如下方式使用它进行预测

Once the LR model has been trained, we can use it for prediction as follows −

首先,我们的评估程序导入 numpy 包,并将训练数据加载到特征矩阵和类别标签矩阵中,方式与我们上面实现的训练程序相同

First of all, our evaluation program imports the numpy package and loads the training data into a feature matrix and a class label matrix in the same way as the training program we implement above −

import numpy as np
def main():
data_file = ".\\dataLRmodel.txt" # provide the name and the location of data file
features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",",
skiprows=0, usecols=(0,1))
labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",",
skiprows=0, usecols=[2], ndmin=2)

接下来,是时候设置由我们的训练程序确定的权重和偏差的值了

Next, it is time to set the values of the weights and the bias that were determined by our training program −

print("Setting weights and bias values \n")
weights = np.array([0.0925, 1.1722], dtype=np.float32)
bias = np.array([-4.5400], dtype=np.float32)
N = len(features_mat)
features_dim = 2

接下来,我们的评估程序将通过如下遍历每个训练项来计算逻辑回归概率 -

Next our evaluation program will compute the logistic regression probability by walking through each training items as follows −

print("item pred_prob pred_label act_label result")
for i in range(0, N): # each item
   x = features_mat[i]
   z = 0.0
   for j in range(0, features_dim):
   z += x[j] * weights[j]
   z += bias[0]
   pred_prob = 1.0 / (1.0 + np.exp(-z))
  pred_label = 0 if pred_prob < 0.5 else 1
   act_label = labels_mat[i]
   pred_str = ‘correct’ if np.absolute(pred_label - act_label) < 1.0e-5 \
    else ‘WRONG’
  print("%2d %0.4f %0.0f %0.0f %s" % \ (i, pred_prob, pred_label, act_label, pred_str))

现在让我们演示如何进行预测 -

Now let us demonstrate how to do prediction −

x = np.array([9.5, 4.5], dtype=np.float32)
print("\nPredicting class for age, education = ")
print(x)
z = 0.0
for j in range(0, features_dim):
z += x[j] * weights[j]
z += bias[0]
p = 1.0 / (1.0 + np.exp(-z))
print("Predicted p = " + str(p))
if p < 0.5: print("Predicted class = 0")
else: print("Predicted class = 1")

Complete prediction evaluation program

import numpy as np
def main():
data_file = ".\\dataLRmodel.txt" # provide the name and the location of data file
features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",",
skiprows=0, usecols=(0,1))
labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",",
skiprows=0, usecols=[2], ndmin=2)
print("Setting weights and bias values \n")
weights = np.array([0.0925, 1.1722], dtype=np.float32)
bias = np.array([-4.5400], dtype=np.float32)
N = len(features_mat)
features_dim = 2
print("item pred_prob pred_label act_label result")
for i in range(0, N): # each item
   x = features_mat[i]
   z = 0.0
   for j in range(0, features_dim):
     z += x[j] * weights[j]
   z += bias[0]
   pred_prob = 1.0 / (1.0 + np.exp(-z))
   pred_label = 0 if pred_prob < 0.5 else 1
   act_label = labels_mat[i]
   pred_str = ‘correct’ if np.absolute(pred_label - act_label) < 1.0e-5 \
     else ‘WRONG’
  print("%2d %0.4f %0.0f %0.0f %s" % \ (i, pred_prob, pred_label, act_label, pred_str))
x = np.array([9.5, 4.5], dtype=np.float32)
print("\nPredicting class for age, education = ")
print(x)
z = 0.0
for j in range(0, features_dim):
   z += x[j] * weights[j]
z += bias[0]
p = 1.0 / (1.0 + np.exp(-z))
print("Predicted p = " + str(p))
if p < 0.5: print("Predicted class = 0")
else: print("Predicted class = 1")
if __name__ == "__main__":
  main()

Output

设置权重和偏差值。

Setting weights and bias values.

Item  pred_prob  pred_label  act_label  result
0   0.3640         0             0     correct
1   0.7254         1             0      WRONG
2   0.2019         0             0     correct
3   0.3562         0             0     correct
4   0.0493         0             0     correct
5   0.1005         0             0     correct
6   0.7892         1             1     correct
7   0.8564         1             1     correct
8   0.9654         1             1     correct
9   0.7587         1             1     correct
10  0.3040         0             1      WRONG
11  0.7129         1             1     correct
Predicting class for age, education =
[9.5 4.5]
Predicting p = 0.526487952
Predicting class = 1

CNTK - Neural Network (NN) Concepts

本章介绍了有关 CNTK 的神经网络的概念。

This chapter deals with concepts of Neural Network with regards to CNTK.

正如我们所知,神经网络需要使用几层神经元。但是,CNTK 中可以用什么来对 NN 的层进行建模呢?这个问题的答案是 layer 模块中定义的 layer 函数。

As we know that, several layers of neurons are used for making a neural network. But, the question arises that in CNTK how we can model the layers of a NN? It can be done with the help of layer functions defined in the layer module.

Layer function

事实上,在 CNTK 中,使用 layer 时会有一种明确的函数式编程感觉。Layer 函数看起来就像一个普通函数,它会生成具有预定义参数集的数学函数。让我们借助 layer 函数来了解如何创建最基本类型的层,即 Dense。

Actually, in CNTK, working with the layers has a distinct functional programming feel to it. Layer function looks like a regular function and it produces a mathematical function with a set of predefined parameters. Let’s see how we can create the most basic layer type, Dense, with the help of layer function.

Example

我们可以借助以下基本步骤创建最基本类型的层:

With the help of following basic steps, we can create the most basic layer type −

Step 1 - 首先,我们需要从 CNTK 的 layer 包中导入 Dense layer 函数。

Step 1 − First, we need to import the Dense layer function from the layers’ package of CNTK.

from cntk.layers import Dense

Step 2 - 接下来需要从 CNTK 根包中导入 input_variable 函数。

Step 2 − Next from the CNTK root package, we need to import the input_variable function.

from cntk import input_variable

Step 3 - 现在,我们需要使用 input_variable 函数创建一个新的输入变量。我们还需要提供它的 size。

Step 3 − Now, we need to create a new input variable using the input_variable function. We also need to provide the its size.

feature = input_variable(100)

Step 4 - 最后,我们将使用 Dense 函数创建一个新层,同时提供希望的神经元数量。

Step 4 − At last, we will create a new layer using Dense function along with providing the number of neurons we want.

layer = Dense(40)(feature)

现在,我们可以调用已配置的 Dense layer 函数,将 Dense layer 连接到输入。

Now, we can invoke the configured Dense layer function to connect the Dense layer to the input.

Complete implementation example

from cntk.layers import Dense
from cntk import input_variable
feature= input_variable(100)
layer = Dense(40)(feature)

Customizing layers

正如我们所见,CNTK 为我们提供了一组相当不错的默认值来构建 NN。基于 activation 函数和其他我们选择的设置,NN 的行为和性能是不同的。这是另一种非常有用的词干算法。因此,最好了解我们可以配置的内容。

As we have seen CNTK provides us with a pretty good set of defaults for building NNs. Based on activation function and other settings we choose, the behavior as well as performance of the NN is different. It is another very useful stemming algorithm. That’s the reason, it is good to understand what we can configure.

Steps to configure a Dense layer

NN 中的每一层都有其独特的配置选项,当我们讨论 Dense layer 时,有以下几个重要设置需要定义:

Each layer in NN has its unique configuration options and when we talk about Dense layer, we have following important settings to define −

  1. shape − As name implies, it defines the output shape of the layer which further determines the number of neurons in that layer.

  2. activation − It defines the activation function of that layer, so it can transform the input data.

  3. init − It defines the initialisation function of that layer. It will initialise the parameters of the layer when we start training the NN.

让我们借助以下步骤了解如何配置 Dense 层:

Let’s see the steps with the help of which we can configure a Dense layer −

Step1 - 首先,我们需要从 CNTK 的 layer 包中导入 Dense layer 函数。

Step1 − First, we need to import the Dense layer function from the layers’ package of CNTK.

from cntk.layers import Dense

Step2 * − Next from the CNTK ops package, we need to import the *sigmoid operator 。它将被配置为激活函数。

Step2 * − Next from the CNTK ops package, we need to import the *sigmoid operator. It will be used to configure as an activation function.

from cntk.ops import sigmoid

Step3 * − Now, from initializer package, we need to import the *glorot_uniform 初始化器。

Step3 * − Now, from initializer package, we need to import the *glorot_uniform initializer.

from cntk.initializer import glorot_uniform

Step4 * − At last, we will create a new layer using Dense function along with providing the number of neurons as the first argument. Also, provide the *sigmoid 运算符作为层函数 activationglorot_uniform 作为层函数 init

Step4 * − At last, we will create a new layer using Dense function along with providing the number of neurons as the first argument. Also, provide the *sigmoid operator as activation function and the glorot_uniform as the init function for the layer.

layer = Dense(50, activation = sigmoid, init = glorot_uniform)

Complete implementation example −

from cntk.layers import Dense
from cntk.ops import sigmoid
from cntk.initializer import glorot_uniform
layer = Dense(50, activation = sigmoid, init = glorot_uniform)

Optimizing the parameters

到目前为止,我们已经了解了如何创建 NN 的结构以及如何配置各种设置。在此,我们将了解如何优化 NN 的参数。借助两个组成部分 learnerstrainers 的组合,我们可以优化 NN 的参数。

Till now, we have seen how to create the structure of a NN and how to configure various settings. Here, we will see, how we can optimise the parameters of a NN. With the help of the combination of two components namely learners and trainers, we can optimise the parameters of a NN.

trainer component

用于优化神经网络参数的第一个组件为 trainer 组件。基本上实现反向传播过程。如果谈论其作用,它将数据通过神经网络进行传递以获得预测结果。

The first component which is used to optimise the parameters of a NN is trainer component. It basically implements the backpropagation process. If we talk about its working, it passes the data through the NN to obtain a prediction.

之后,它会使用名为学习器的另一个组件,以获得神经网络中参数的新值。一旦获得新值后,它会应用这些新值并重复此过程,直至满足退出条件。

After that, it uses another component called learner in order to obtain the new values for the parameters in a NN. Once it obtains the new values, it applies these new values and repeat the process until an exit criterion is met.

learner component

用于优化神经网络参数的第二个组件为 learner 组件,它主要负责执行梯度下降算法。

The second component which is used to optimise the parameters of a NN is learner component, which is basically responsible for performing the gradient descent algorithm.

Learners included in the CNTK library

以下是 CNTK 库中提供的一些有趣的学习器的列表 −

Following is the list of some of the interesting learners included in CNTK library −

  1. Stochastic Gradient Descent (SGD) − This learner represents the basic stochastic gradient descent, without any extras.

  2. Momentum Stochastic Gradient Descent (MomentumSGD) − With SGD, this learner applies the momentum to overcome the problem of local maxima.

  3. RMSProp − This learner, in order to control the rate of descent, uses decaying learning rates.

  4. Adam − This learner, in order to decrease the rate of descent over time, uses decaying momentum.

  5. Adagrad − This learner, for frequently as well as infrequently occurring features, uses different learning rates.

CNTK - Creating First Neural Network

本章将详细介绍在 CNTK 中创建神经网络。

This chapter will elaborate on creating a neural network in CNTK.

Build the network structure

为了将 CNTK 概念应用到构建我们第一个的神经网络,我们准备使用神经网络根据花萼宽度和长度以及花瓣宽度和长度的物理属性对鸢尾科花卉的种类进行分类。我们将使用的数据集是描述不同品种鸢尾科花的物理属性的鸢尾数据集 -

In order to apply CNTK concepts to build our first NN, we are going to use NN to classify species of iris flowers based on the physical properties of sepal width and length, and petal width and length. The dataset which we will be using iris dataset that describes the physical properties of different varieties of iris flowers −

  1. Sepal length

  2. Sepal width

  3. Petal length

  4. Petal width

  5. Class i.e. iris setosa or iris versicolor or iris virginica

此处,我们将构建一种称为前馈神经网络常规神经网络。让我们来看看构建神经网络结构的实现步骤 -

Here, we will be building a regular NN called a feedforward NN. Let us see the implementation steps to build the structure of NN −

Step 1 - 首先,我们将从 CNTK 库导入必要组件,如层类型、激活函数和一个允许我们在神经网络中定义输入变量的函数。

Step 1 − First, we will import the necessary components such as our layer types, activation functions, and a function that allows us to define an input variable for our NN, from CNTK library.

from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu

Step 2 - 接下来,我们将使用顺序函数创建模型。创建后,我们将根据需要将其输入到层中。在此,我们准备在神经网络中创建两个不同层;一层有 4 个神经元,另一层有 3 个神经元。

Step 2 − After that, we will create our model using sequential function. Once created, we will feed it with the layers we want. Here, we are going to create two distinct layers in our NN; one with four neurons and another with three neurons.

model = Sequential([Dense(4, activation=relu), Dense(3, activation=log_sogtmax)])

Step 3 - 最后,为了编译神经网络,我们将把网络绑定到输入变量。它将有一个包含 4 个神经元的输入层和一个包含 3 个神经元的输出层。

Step 3 − At last, in order to compile the NN, we will bind the network to the input variable. It has an input layer with four neurons and an output layer with three neurons.

feature= input_variable(4)
z = model(feature)

Applying an activation function

有很多激活函数可以选用,而选择正确的激活函数肯定将对我们深度学习模型的执行效果产生重要影响。

There are lots of activation functions to choose from and choosing the right activation function will definitely make a big difference to how well our deep learning model will perform.

At the output layer

在输出层选择 activation 函数将取决于我们准备用模型解决的问题类型。

Choosing an activation function at the output layer will depend upon the kind of problem we are going to solve with our model.

  1. For a regression problem, we should use a linear activation function on the output layer.

  2. For a binary classification problem, we should use a sigmoid activation function on the output layer.

  3. For multi-class classification problem, we should use a softmax activation function on the output layer.

  4. Here, we are going to build a model for predicting one of the three classes. It means we need to use softmax activation function at output layer.

At the hidden layer

在隐藏层选择 activation 函数需要进行一些实验以监测性能,以查看哪个激活函数效果最好。

Choosing an activation function at the hidden layer requires some experimentation for monitoring the performance to see which activation function works well.

  1. In a classification problem, we need to predict the probability a sample belongs to a specific class. That’s why we need an activation function that gives us probabilistic values. To reach this goal, sigmoid activation function can help us.

  2. One of the major problems associated with sigmoid function is vanishing gradient problem. To overcome such problem, we can use ReLU activation function that coverts all negative values to zero and works as a pass-through filter for positive values.

Picking a loss function

首先,我们获得了神经网络模型的结构,我们必须对其进行优化。对于优化,我们需要一个 loss function 。与 activation functions 不同,我们的选择损失函数非常少。然而,选择损失函数将取决于我们的模型解决的问题类型。

Once, we have the structure for our NN model, we must have to optimise it. For optimising we need a loss function. Unlike activation functions, we have very less loss functions to choose from. However, choosing a loss function will depend upon the kind of problem we are going to solve with our model.

例如,在分类问题中,我们应该使用能够衡量预测类和实际类之间的差别的损失函数。

For example, in a classification problem, we should use a loss function that can measure the difference between a predicted class and an actual class.

loss function

对于分类问题,我们将使用我们的神经网络模型求解, categorical cross entropy 损失函数是最佳选择。在CNTK中,它被实现为 cross_entropy_with_softmax ,可以从 cntk.losses 包中导入,如下所示−

For the classification problem, we are going to solve with our NN model, categorical cross entropy loss function is the best candidate. In CNTK, it is implemented as cross_entropy_with_softmax which can be imported from cntk.losses package, as follows−

label= input_variable(3)
loss = cross_entropy_with_softmax(z, label)

Metrics

有了我们神经网络模型的结构和要应用的损失函数,我们已经具备了制作深度学习模型优化配方所需的所有要素。但是,在深入研究之前,我们应该了解指标。

With having the structure for our NN model and a loss function to apply, we have all the ingredients to start making the recipe for optimising our deep learning model. But, before getting deep dive into this, we should learn about metrics.

cntk.metrics

CNTK有一个名为 cntk.metrics 的包,我们可以从中导入我们准备使用的指标。当我们建立分类模型时,我们将使用 classification_error ,它将产生0到1之间的数字。0到1之间的数字表示正确预测样本的百分比−

CNTK has the package named cntk.metrics from which we can import the metrics we are going to use. As we are building a classification model, we will be using classification_error matric that will produce a number between 0 and 1. The number between 0 and 1 indicates the percentage of samples correctly predicted −

首先,我们需要从 cntk.metrics 包中导入指标−

First, we need to import the metric from cntk.metrics package −

from cntk.metrics import classification_error
error_rate = classification_error(z, label)

以上函数实际上需要神经网络的输出和预期标签作为输入。

The above function actually needs the output of the NN and the expected label as input.

CNTK - Training the Neural Network

在这里,我们将了解如何在CNTK中训练神经网络。

Here, we will understand about training the Neural Network in CNTK.

Training a model in CNTK

在上一节中,我们已经定义了深度学习模型的所有组成部分。现在是时候对其进行训练了。如我们之前讨论过的,我们可以使用 learnertrainer 的组合在CNTK中训练神经网络模型。

In the previous section, we have defined all the components for the deep learning model. Now it is time to train it. As we discussed earlier, we can train a NN model in CNTK using the combination of learner and trainer.

Choosing a learner and setting up training

在本节中,我们将定义 learner 。CNTK提供几个 learners 供我们选择。对于我们在前面部分中定义的模型,我们将使用 Stochastic Gradient Descent (SGD) learner

In this section, we will be defining the learner. CNTK provides several learners to choose from. For our model, defined in previous sections, we will be using Stochastic Gradient Descent (SGD) learner.

为了训练神经网络,让我们借助以下步骤配置 learnertrainer

In order to train the neural network, let us configure the learner and trainer with the help of following steps −

Step 1 −首先,我们需要从 cntk.lerners 包中导入 sgd 函数。

Step 1 − First, we need to import sgd function from cntk.lerners package.

from cntk.learners import sgd

Step 2 −接下来,我们需要从 cntk.train .trainer包中导入 Trainer 函数。

Step 2 − Next, we need to import Trainer function from cntk.train.trainer package.

from cntk.train.trainer import Trainer

Step 3 −现在,我们需要创建一个 learner 。可以通过调用 sgd 函数,以及提供模型参数和学习速率值来创建它。

Step 3 − Now, we need to create a learner. It can be created by invoking sgd function along with providing model’s parameters and a value for the learning rate.

learner = sgd(z.parametrs, 0.01)

Step 4 −最后,我们需要初始化 trainer 。必须为其提供网络、 lossmetric 的组合以及 learner

Step 4 − At last, we need to initialize the trainer. It must be provided the network, the combination of the loss and metric along with the learner.

trainer = Trainer(z, (loss, error_rate), [learner])

控制优化速度的学习速率应该是0.1到0.001之间的较小子。

The learning rate which controls the speed of optimisation should be small number between 0.1 to 0.001.

Choosing a learner and setting up the training - Complete example

from cntk.learners import sgd
from cntk.train.trainer import Trainer
learner = sgd(z.parametrs, 0.01)
trainer = Trainer(z, (loss, error_rate), [learner])

Feeding data into the trainer

一旦我们选择了并配置了训练器,就该加载数据集了。我们已经将 iris 数据集保存为. CSV 文件,并且我们将使用名为 pandas 的数据整理包来加载数据集。

Once we chose and configured the trainer, it is time to load the dataset. We have saved the iris dataset as a .CSV file and we will be using data wrangling package named pandas to load the dataset.

Steps to load the dataset from .CSV file

Step 1 −首先,我们需要导入 pandas 包。

Step 1 − First, we need to import the pandas package.

from import pandas as pd

Step 2 −现在,我们需要调用名为 read_csv 的函数来从磁盘加载.csv文件。

Step 2 − Now, we need to invoke the function named read_csv function to load the .csv file from the disk.

df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’,
‘petal_length’, ‘petal_width’, index_col=False)

在加载数据集后,我们需要将其分割成一组特征和一个标签。

Once we load the dataset, we need to split it into a set of features and a label.

Steps to split the dataset into features and label

Step 1 - 首先,我们需要从数据集中选择所有行和前四列。可以通过使用 iloc 函数来完成。

Step 1 − First, we need to select all rows and first four columns from the dataset. It can be done by using iloc function.

x = df_source.iloc[:, :4].values

Step 2 - 然后我们需要从 iris 数据集中选择 species 列。我们将使用 values 属性来访问底层的 numpy 数组。

Step 2 − Next we need to select the species column from iris dataset. We will be using the values property to access the underlying numpy array.

x = df_source[‘species’].values

Steps to encode the species column to a numeric vector representation

正如我们前面所讨论的,我们的模型基于分类,它需要数字输入值。因此,在这里我们需要将 species 列编码为一个数字向量表示。下面来看一下执行此操作的步骤 −

As we discussed earlier, our model is based on classification, it requires numeric input values. Hence, here we need to encode the species column to a numeric vector representation. Let’s see the steps to do it −

Step 1 - 首先,我们需要创建一个列表表达式来遍历数组中的所有元素。然后为每个值在 label_mapping 字典中执行查找。

Step 1 − First, we need to create a list expression to iterate over all elements in the array. Then perform a look up in the label_mapping dictionary for each value.

label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}

Step 2 - 然后,将这个转换后的数字值转换为一个独热编码向量。我们将使用 one_hot 函数,如下所示 −

Step 2 − Next, convert this converted numeric value to a one-hot encoded vector. We will be using one_hot function as follows −

def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result

Step 3 - 最后,我们需要将这个转换后的列表变成一个 numpy 数组。

Step 3 − At last, we need to turn this converted list into a numpy array.

y = np.array([one_hot(label_mapping[v], 3) for v in y])

Steps to detect overfitting

当模型记住样本但无法从训练样本中推导出规则时,这种情况称为过拟合。通过以下步骤,我们可以检测出模型的过拟合 −

The situation, when your model remembers samples but can’t deduce rules from the training samples, is overfitting. With the help of following steps, we can detect overfitting on our model −

Step 1 - 首先,从 sklearn 包中导入 train_test_split *function from the *model_selection 模块。

Step 1 − First, from sklearn package, import the train_test_split *function from the *model_selection module.

from sklearn.model_selection import train_test_split

Step 2 - 然后,我们需要使用特征 x 和标签 y 调用 train_test_split 函数,如下所示 −

Step 2 − Next, we need to invoke the train_test_split function with features x and labels y as follows −

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0-2,
stratify=y)

我们指定 test_size 为 0.2 以保留 20% 的总数据。

We specified a test_size of 0.2 to set aside 20% of total data.

label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}

Steps to feed training set and validation set to our model

Step 1 - 为了训练我们的模型,首先,我们将调用 train_minibatch 方法。然后给它一个字典,将输入数据映射到我们用来定义神经网络及其相关损失函数的输入变量。

Step 1 − In order to train our model, first, we will be invoking the train_minibatch method. Then give it a dictionary that maps the input data to the input variable that we have used to define the NN and its associated loss function.

trainer.train_minibatch({ features: X_train, label: y_train})

Step 2 - 然后,通过使用以下 for 循环调用 train_minibatch

Step 2 − Next, call train_minibatch by using the following for loop −

for _epoch in range(10):
trainer.train_minbatch ({ feature: X_train, label: y_train})
print(‘Loss: {}, Acc: {}’.format(
trainer.previous_minibatch_loss_average,
trainer.previous_minibatch_evaluation_average))

Feeding data into the trainer - Complete example

from import pandas as pd
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, index_col=False)
x = df_source.iloc[:, :4].values
x = df_source[‘species’].values
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result
y = np.array([one_hot(label_mapping[v], 3) for v in y])
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0-2, stratify=y)
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
trainer.train_minibatch({ features: X_train, label: y_train})
for _epoch in range(10):
trainer.train_minbatch ({ feature: X_train, label: y_train})
print(‘Loss: {}, Acc: {}’.format(
trainer.previous_minibatch_loss_average,
trainer.previous_minibatch_evaluation_average))

Measuring the performance of NN

为了优化我们的神经网络模型,每当我们通过训练器传递数据时,它都会通过我们为训练器配置的指标来衡量模型的性能。这种在训练过程中衡量神经网络模型性能的方法是通过训练数据。但是另一方面,为了对模型性能进行全面分析,我们还需要使用测试数据。

In order to optimise our NN model, whenever we pass data through the trainer, it measures the performance of the model through the metric that we configured for trainer. Such measurement of performance of NN model during training is on training data. But on the other hand, for a full analysis of the model performance we need to use test data as well.

因此,为了使用测试数据衡量模型的性能,我们可以调用 trainer 上的 test_minibatch 方法,如下所示 −

So, to measure the performance of the model using the test data, we can invoke the test_minibatch method on the trainer as follows −

trainer.test_minibatch({ features: X_test, label: y_test})

Making prediction with NN

一旦你训练了一个深度学习模型,最重要的是使用它进行预测。为了从上面训练的神经网络中进行预测,我们可以按照以下步骤进行操作 −

Once you trained a deep learning model, the most important thing is to make predictions using that. In order to make prediction from the above trained NN, we can follow the given steps−

Step 1 - 首先,我们需要使用以下函数从测试集中挑选一个随机项 −

Step 1 − First, we need to pick a random item from the test set using the following function −

np.random.choice

Step 2 - 然后,我们需要使用 sample_index 从测试集中选择样本数据。

Step 2 − Next, we need to select the sample data from the test set by using sample_index.

Step 3 - 现在,为了将神经网络的数字输出转换为实际标签,创建一个反向映射。

Step 3 − Now, in order to convert the numeric output to the NN to an actual label, create an inverted mapping.

Step 4 − 现在,使用选定的 sample 数据。通过将 NN z 作为函数来调用,做出预测。

Step 4 − Now, use the selected sample data. Make a prediction by invoking the NN z as a function.

Step 5 − 现在,一旦获得了预测输出,将具有最高值的神经元的索引作为预测值。可以使用 numpy 包中的 np.argmax 函数来完成此操作。

Step 5 − Now, once you got the predicted output, take the index of the neuron that has the highest value as the predicted value. It can be done by using the np.argmax function from the numpy package.

Step 6 − 最后,使用 inverted_mapping 将索引值转换为真实的标签。

Step 6 − At last, convert the index value into the real label by using inverted_mapping.

Making prediction with NN - Complete example

sample_index = np.random.choice(X_test.shape[0])
sample = X_test[sample_index]
inverted_mapping = {
   1:’Iris-setosa’,
   2:’Iris-versicolor’,
   3:’Iris-virginica’
}
prediction = z(sample)
predicted_label = inverted_mapping[np.argmax(prediction)]
print(predicted_label)

Output

在训练完上述深度学习模型并运行它之后,你将获得以下输出 −

After training the above deep learning model and running it, you will get the following output −

Iris-versicolor

CNTK - In-Memory and Large Datasets

在本教程中,我们将学习如何在 CNTK 中处理内存和大型数据集。

In this chapter, we will learn about how to work with the in-memory and large datasets in CNTK.

Training with small in memory datasets

当我们讨论将数据馈送至 CNTK 训练器的问题时,可能有许多方法,但这取决于数据集的大小以及格式。数据集可以是小内存数据集或者大数据集。

When we talk about feeding data into CNTK trainer, there can be many ways, but it will depend upon the size of the dataset and format of the data. The data sets can be small in-memory or large datasets.

在本节中,我们将处理内存数据集。为此,我们将使用以下两个框架 −

In this section, we are going to work with in-memory datasets. For this, we will use the following two frameworks −

  1. Numpy

  2. Pandas

Using Numpy arrays

在此,我们将使用基于 numpy 的随机生成数据集在 CNTK 中进行操作。在本例中,我们将模拟用于二进制分类问题的的数据。假设我们有一组包含 4 个特征的观测值,并且希望使用我们的深度学习模型来预测两个可能的标签。

Here, we will work with a numpy based randomly generated dataset in CNTK. In this example, we are going to simulate data for a binary classification problem. Suppose, we have a set of observations with 4 features and want to predict two possible labels with our deep learning model.

Implementation Example

为此,我们首先必须生成一组标签,其中包含我们要预测的标签的 one-hot 向量表示。可以通过以下步骤来完成此操作 −

For this, first we must generate a set of labels containing a one-hot vector representation of the labels, we want to predict. It can be done with the help of following steps −

Step 1 − 导入 numpy 包,如下所示 −

Step 1 − Import the numpy package as follows −

import numpy as np
num_samples = 20000

Step 2 − 接下来,使用 np.eye 函数按如下方式生成标签映射 −

Step 2 − Next, generate a label mapping by using np.eye function as follows −

label_mapping = np.eye(2)

Step 3 − 现在,使用 np.random.choice 函数按如下方式收集 20000 个随机样本 −

Step 3 − Now by using np.random.choice function, collect the 20000 random samples as follows −

y = label_mapping[np.random.choice(2,num_samples)].astype(np.float32)

Step 4 − 现在,最后使用 np.random.random 函数生成随机浮点数数组,如下所示 −

Step 4 − Now at last by using np.random.random function, generate an array of random floating point values as follows −

x = np.random.random(size=(num_samples, 4)).astype(np.float32)

一旦生成随机浮点数数组,我们需要将它们转换为 32 位浮点数,以便与 CNTK 预期的格式相匹配。让我们按照以下步骤执行此操作 −

Once, we generate an array of random floating-point values, we need to convert them to 32-bit floating point numbers so that it can be matched to the format expected by CNTK. Let’s follow the steps below to do this −

Step 5 − 从 cntk.layers 模块导入 Dense 和 Sequential 层函数,如下所示 −

Step 5 − Import the Dense and Sequential layer functions from cntk.layers module as follows −

from cntk.layers import Dense, Sequential

Step 6 − 现在,我们需要导入网络中各层的激活函数。让我们导入 sigmoid 作为激活函数 −

Step 6 − Now, we need to import the activation function for the layers in the network. Let us import the sigmoid as activation function −

from cntk import input_variable, default_options
from cntk.ops import sigmoid

Step 7 − 现在,我们需要导入损失函数来训练网络。让我们导入 binary_cross_entropy 作为损失函数 −

Step 7 − Now, we need to import the loss function to train the network. Let us import binary_cross_entropy as loss function −

from cntk.losses import binary_cross_entropy

Step 8 − 接下来,我们需要为网络定义默认选项。在此,我们将 sigmoid 激活函数作为默认设置。此外,使用 Sequential 层函数按如下方式创建模型 −

Step 8 − Next, we need to define the default options for the network. Here, we will be providing the sigmoid activation function as a default setting. Also, create the model by using Sequential layer function as follows −

with default_options(activation=sigmoid):
model = Sequential([Dense(6),Dense(2)])

Step 9 − 接下来,初始化 input_variable ,其中 4 个输入特征用作网络输入。

Step 9 − Next, initialise an input_variable with 4 input features serving as the input for the network.

features = input_variable(4)

Step 10 − 现在,为了完成它,我们需要将特征变量连接到神经网络。

Step 10 − Now, in order to complete it, we need to connect features variable to the NN.

z = model(features)

因此,现在我们有了一个神经网络,借助于以下步骤,让我们使用内存数据集对其进行训练 −

So, now we have a NN, with the help of following steps, let us train it using in-memory dataset −

Step 11 − 为了训练这个神经网络,我们首先需要从 cntk.learners 模块中导入学习器。我们将导入 sgd 学习器,如下所示 −

Step 11 − To train this NN, first we need to import learner from cntk.learners module. We will import sgd learner as follows −

from cntk.learners import sgd

Step 12 − 同时从 cntk.logging 模块中导入 ProgressPrinter

Step 12 − Along with that import the ProgressPrinter from cntk.logging module as well.

from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)

Step 13 − 接下来,为标签定义一个新的输入变量,如下所示 −

Step 13 − Next, define a new input variable for the labels as follows −

labels = input_variable(2)

Step 14 − 为了训练神经网络模型,接下来我们需要使用 binary_cross_entropy 函数定义一个损失。此外,提供模型 z 和标签变量。

Step 14 − In order to train the NN model, next, we need to define a loss using the binary_cross_entropy function. Also, provide the model z and the labels variable.

loss = binary_cross_entropy(z, labels)

Step 15 − 接下来,初始化 sgd 学习器,如下所示 −

Step 15 − Next, initialize the sgd learner as follows −

learner = sgd(z.parameters, lr=0.1)

Step 16 − 最后,在损失函数上调用 train 方法。此外,还为其提供输入数据、 sgd 学习器和 progress_printer

Step 16 − At last, call the train method on the loss function. Also, provide it with the input data, the sgd learner and the progress_printer.−

training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer])

Complete implementation example

import numpy as np
num_samples = 20000
label_mapping = np.eye(2)
y = label_mapping[np.random.choice(2,num_samples)].astype(np.float32)
x = np.random.random(size=(num_samples, 4)).astype(np.float32)
from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid
from cntk.losses import binary_cross_entropy
with default_options(activation=sigmoid):
   model = Sequential([Dense(6),Dense(2)])
features = input_variable(4)
z = model(features)
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)
labels = input_variable(2)
loss = binary_cross_entropy(z, labels)
learner = sgd(z.parameters, lr=0.1)
training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer])

Output

Build info:
     Built time: *** ** **** 21:40:10
     Last modified date: *** *** ** 21:08:46 2019
     Build type: Release
     Build target: CPU-only
     With ASGD: yes
     Math lib: mkl
     Build Branch: HEAD
     Build SHA1:ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
     MPI distribution: Microsoft MPI
     MPI version: 7.0.12437.6
-------------------------------------------------------------------
average   since   average   since examples
loss      last    metric    last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.52      1.52      0         0     32
1.51      1.51      0         0     96
1.48      1.46      0         0    224
1.45      1.42      0         0    480
1.42       1.4      0         0    992
1.41      1.39      0         0   2016
1.4       1.39      0         0   4064
1.39      1.39      0         0   8160
1.39      1.39      0         0  16352

Using Pandas DataFrames

Numpy 数组在它们能包含的内容方面非常有限,并且是存储数据的最基本方式之一。例如,一个 n 维数组可以包含单一数据类型的数据。但在另一方面,对于许多现实世界的案例,我们需要一个库,它可以在单个数据集中处理多种数据类型。

Numpy arrays are very limited in what they can contain and one of the most basic ways of storing data. For example, a single n-dimensional array can contain data of a single data type. But on the other hand, for many real-world cases we need a library that can handle more than one data type in a single dataset.

一个称为 Pandas 的 Python 库可以更容易地处理此类数据集。它引入了数据框 (DF) 的概念,并允许我们加载存储在各种格式中的磁盘上的数据集作为 DF。例如,我们可以读取存储为 CSV、JSON、Excel 等格式的 DF。

One of the Python libraries called Pandas makes it easier to work with such kind of datasets. It introduces the concept of a DataFrame (DF) and allows us to load datasets from disk stored in various formats as DFs. For example, we can read DFs stored as CSV, JSON, Excel, etc.

您可以在链接中更详细地了解 Python Pandas 库:/python_pandas/index.html[ [role="bare"] [role="bare"]https://www.tutorialspoint.com/python_pandas/index.htm .]

You can learn Python Pandas library in more detail at link: /python_pandas/index.html[ [role="bare"]https://www.tutorialspoint.com/python_pandas/index.htm.]

Implementation Example

在这个示例中,我们将使用基于四项属性对三种可能的鸢尾花物种进行分类的示例。我们也在前面的部分中创建了这个深度学习模型。模型如下 −

In this example, we are going to use the example of classifying three possible species of the iris flowers based on four properties. We have created this deep learning model in the previous sections too. The model is as follows −

from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid, log_softmax
from cntk.losses import binary_cross_entropy
model = Sequential([
Dense(4, activation=sigmoid),
Dense(3, activation=log_softmax)
])
features = input_variable(4)
z = model(features)

以上模型包含一个隐藏层和一个具有三个神经元的输出层,以匹配我们可预测的类数。

The above model contains one hidden layer and an output layer with three neurons to match the number of classes we can predict.

接下来,我们将使用 train 方法和 loss 函数来训练网络。为此,我们必须首先加载和预处理虹膜数据集,以便它与神经网络的预期布局和数据格式匹配。这可以通过以下步骤来完成 −

Next, we will use the train method and loss function to train the network. For this, first we must load and preprocess the iris dataset, so that it matches the expected layout and data format for the NN. It can be done with the help of following steps −

Step 1 − 导入 numpyPandas 包,如下所示 −

Step 1 − Import the numpy and Pandas package as follows −

import numpy as np
import pandas as pd

Step 2 − 接下来,使用 read_csv 函数将数据集加载到内存中 −

Step 2 − Next, use the read_csv function to load the dataset into memory −

df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’,
 ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)

Step 3 − 现在,我们需要创建一个字典,它将数据集中的标签与其对应的数字表示相映射。

Step 3 − Now, we need to create a dictionary that will be mapping the labels in the dataset with their corresponding numeric representation.

label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}

Step 4 − 现在,通过在 DataFrame 上使用 iloc 索引器,选择前四列,如下所示 −

Step 4 − Now, by using iloc indexer on the DataFrame, select the first four columns as follows −

x = df_source.iloc[:, :4].values

Step 5 −接下来,我们需要选择物种列作为数据集的标签。它可以按以下方式完成 −

Step 5 −Next, we need to select the species columns as the labels for the dataset. It can be done as follows −

y = df_source[‘species’].values

Step 6 − 现在,我们需要映射数据集中的标签,可以使用 label_mapping 完成此操作。此外,使用 one_hot 编码将它们转换为 one-hot 编码数组。

Step 6 − Now, we need to map the labels in the dataset, which can be done by using label_mapping. Also, use one_hot encoding to convert them into one-hot encoding arrays.

y = np.array([one_hot(label_mapping[v], 3) for v in y])

Step 7 − 接下来,为了将特征和映射的标签与 CNTK 一起使用,我们需要将它们都转换为浮点数 −

Step 7 − Next, to use the features and the mapped labels with CNTK, we need to convert them both to floats −

x= x.astype(np.float32)
y= y.astype(np.float32)

众所周知,标签以字符串的形式存储在数据集中,而 CNTK 无法处理这些字符串。这就是需要使用 one-hot 编码向量来表示标签的原因。为此,我们可以定义一个函数 one_hot 如下所示 −

As we know that, the labels are stored in the dataset as strings and CNTK cannot work with these strings. That’s the reason, it needs one-hot encoded vectors representing the labels. For this, we can define a function say one_hot as follows −

def one_hot(index, length):
result = np.zeros(length)
result[index] = index
return result

现在,我们有了正确格式的 numpy 数组,借助以下步骤,我们可以使用它们来训练我们的模型 −

Now, we have the numpy array in the correct format, with the help of following steps we can use them to train our model −

Step 8 − 首先,我们需要导入损失函数来训练网络。让我们导入 binary_cross_entropy_with_softmax 作为损失函数 −

Step 8 − First, we need to import the loss function to train the network. Let us import binary_cross_entropy_with_softmax as loss function −

from cntk.losses import binary_cross_entropy_with_softmax

Step 9 − 要训练此 NN,我们还需要从 cntk.learners 模块中导入 learner。我们将导入 sgd learner,如下所示 −

Step 9 − To train this NN, we also need to import learner from cntk.learners module. We will import sgd learner as follows −

from cntk.learners import sgd

Step 10 − 同时从 cntk.logging 模块中导入 ProgressPrinter

Step 10 − Along with that import the ProgressPrinter from cntk.logging module as well.

from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)

Step 11 − 接下来,定义标签的新输入变量,如下所示 −

Step 11 − Next, define a new input variable for the labels as follows −

labels = input_variable(3)

Step 12 − 为了训练 NN 模型,接下来我们需要使用 binary_cross_entropy_with_softmax 函数定义损失。还要提供模型 z 和标签变量。

Step 12 − In order to train the NN model, next, we need to define a loss using the binary_cross_entropy_with_softmax function. Also provide the model z and the labels variable.

loss = binary_cross_entropy_with_softmax (z, labels)

Step 13 − 接下来,初始化 sgd learner,如下所示 −

Step 13 − Next, initialise the sgd learner as follows −

learner = sgd(z.parameters, 0.1)

Step 14 − 最后,调用损失函数上的 train 方法。此外,还需要向它提供输入数据、 sgd learner 和 progress_printer

Step 14 − At last, call the train method on the loss function. Also, provide it with the input data, the sgd learner and the progress_printer.

training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=
[progress_writer],minibatch_size=16,max_epochs=5)

Complete implementation example

from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid, log_softmax
from cntk.losses import binary_cross_entropy
model = Sequential([
Dense(4, activation=sigmoid),
Dense(3, activation=log_softmax)
])
features = input_variable(4)
z = model(features)
import numpy as np
import pandas as pd
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
x = df_source.iloc[:, :4].values
y = df_source[‘species’].values
y = np.array([one_hot(label_mapping[v], 3) for v in y])
x= x.astype(np.float32)
y= y.astype(np.float32)
def one_hot(index, length):
result = np.zeros(length)
result[index] = index
return result
from cntk.losses import binary_cross_entropy_with_softmax
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)
labels = input_variable(3)
loss = binary_cross_entropy_with_softmax (z, labels)
learner = sgd(z.parameters, 0.1)
training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer],minibatch_size=16,max_epochs=5)

Output

Build info:
     Built time: *** ** **** 21:40:10
     Last modified date: *** *** ** 21:08:46 2019
     Build type: Release
     Build target: CPU-only
     With ASGD: yes
     Math lib: mkl
     Build Branch: HEAD
     Build SHA1:ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
     MPI distribution: Microsoft MPI
     MPI version: 7.0.12437.6
-------------------------------------------------------------------
average    since    average   since   examples
loss        last     metric   last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.1         1.1        0       0      16
0.835     0.704        0       0      32
1.993      1.11        0       0      48
1.14       1.14        0       0     112
[………]

Training with large datasets

在上一节中,我们使用 Numpy 和 Pandas 处理了内存中的小数据集,但并非所有数据集都如此之小。尤其是包含图像、视频、声音样本的数据集很大。 MinibatchSource 是一个组件,它可以分块加载数据,由 CNTK 提供,用于处理如此大的数据集。 MinibatchSource 组件的一些功能如下 −

In the previous section, we worked with small in-memory datasets using Numpy and pandas, but not all datasets are so small. Specially the datasets containing images, videos, sound samples are large. MinibatchSource is a component, that can load data in chunks, provided by CNTK to work with such large datasets. Some of the features of MinibatchSource components are as follows −

  1. MinibatchSource can prevent NN from overfitting by automatically randomize samples read from the data source.

  2. It has built-in transformation pipeline which can be used to augment the data.

  3. It loads the data on a background thread separate from the training process.

在以下章节中,我们将探讨如何使用带有内存外数据的小批量源来处理大型数据集。我们还将探讨如何使用它来输入 NN 训练。

In the following sections, we are going to explore how to use a minibatch source with out-of-memory data to work with large datasets. We will also explore, how we can use it to feed for training a NN.

Creating MinibatchSource instance

在上一节中,我们使用了鸢尾花示例,并使用 Pandas DataFrames 处理了内存中的小型数据集。在这里,我们将使用 MinibatchSource 替换使用 pandas DF 中数据的部分代码。首先,我们需要按照以下步骤创建 MinibatchSource 实例 −

In the previous section, we have used iris flower example and worked with small in-memory dataset using Pandas DataFrames. Here, we will be replacing the code that uses data from a pandas DF with MinibatchSource. First, we need to create an instance of MinibatchSource with the help of following steps −

Implementation Example

Step 1 − 首先,从 cntk.io 模块中导入 minibatchsource 的组件,如下所示 −

Step 1 − First, from cntk.io module import the components for the minibatchsource as follows −

from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer,
 INFINITY_REPEAT

Step 2 − 现在,通过使用 StreamDef 类,为标签创建一个流定义。

Step 2 − Now, by using StreamDef class, crate a stream definition for the labels.

labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)

Step 3 − 接下来的步骤是创建用于读取输入文件中的特征,创建另一个 StreamDef 实例,如下所示。

Step 3 − Next, create to read the features filed from the input file, create another instance of StreamDef as follows.

feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)

Step 4 − 现在,我们需要提供 iris.ctf 文件作为输入,并初始化 deserializer ,如下所示−

Step 4 − Now, we need to provide iris.ctf file as input and initialise the deserializer as follows −

deserializer = CTFDeserializer(‘iris.ctf’, StreamDefs(labels=
label_stream, features=features_stream)

Step 5 − 最后,我们需要通过使用 deserializer 作为参数创建 minisourceBatch 的实例,如下所示 −

Step 5 − At last, we need to create instance of minisourceBatch by using deserializer as follows −

Minibatch_source = MinibatchSource(deserializer, randomize=True)

Creating a MinibatchSource instance - Complete implementation example

from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer, INFINITY_REPEAT
labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)
feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)
deserializer = CTFDeserializer(‘iris.ctf’, StreamDefs(labels=label_stream, features=features_stream)
Minibatch_source = MinibatchSource(deserializer, randomize=True)

Creating MCTF file

如上所示,我们从“iris.ctf”文件中获取数据。该文件具有称为 CNTK 文本格式(CTF)的文件格式。对于我们上述创建的 MinibatchSource 实例,创建 CTF 文件以获取数据是强制性的。我们来看看如何创建一个 CTF 文件。

As you have seen above, we are taking the data from ‘iris.ctf’ file. It has the file format called CNTK Text Format(CTF). It is mandatory to create a CTF file to get the data for the MinibatchSource instance we created above. Let us see how we can create a CTF file.

Implementation Example

Step 1 − 首先,我们需要导入 panda 和 numpy 包,如下所示 −

Step 1 − First, we need to import the pandas and numpy packages as follows −

import pandas as pd
import numpy as np

Step 2 − 接下来,我们需要将我们的数据文件,即 iris.csv 载入内存中。然后,将其存储在 df_source 变量中。

Step 2 − Next, we need to load our data file, i.e. iris.csv into memory. Then, store it in the df_source variable.

df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)

Step 3 − 现在,通过使用 iloc 索引器作为特征,取前四列的内容。同时,使用 species 列中的数据,如下所示 −

Step 3 − Now, by using iloc indexer as the features, take the content of the first four columns. Also, use the data from species column as follows −

features = df_source.iloc[: , :4].values
labels = df_source[‘species’].values

Step 4 − 接下来,我们需要创建标签名称与其数字表示形式之间的映射。它可以通过创建 label_mapping ,如下所示,完成 −

Step 4 − Next, we need to create a mapping between the label name and its numeric representation. It can be done by creating label_mapping as follows −

label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}

Step 5 − 现在,将标签转换为一组独热编码向量,如下所示 −

Step 5 − Now, convert the labels to a set of one-hot encoded vectors as follows −

labels = [one_hot(label_mapping[v], 3) for v in labels]

现在,就如我们在之前做过的那样,创建一个名为 one_hot 的实用函数来编码标签。它可以按照以下方法完成 −

Now, as we did before, create a utility function called one_hot to encode the labels. It can be done as follows −

def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result

由于我们已经载入并预处理了数据,所以这是将数据以 CTF 文件格式存储在磁盘上的时候了。我们可以借助以下 Python 代码来完成此操作 −

As, we have loaded and preprocessed the data, it’s time to store it on disk in the CTF file format. We can do it with the help of following Python code −

With open(‘iris.ctf’, ‘w’) as output_file:
for index in range(0, feature.shape[0]):
feature_values = ‘ ‘.join([str(x) for x in np.nditer(features[index])])
label_values = ‘ ‘.join([str(x) for x in np.nditer(labels[index])])
output_file.write(‘features {} | labels {} \n’.format(feature_values, label_values))

Creating a MCTF file - Complete implementation example

import pandas as pd
import numpy as np
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)
features = df_source.iloc[: , :4].values
labels = df_source[‘species’].values
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
labels = [one_hot(label_mapping[v], 3) for v in labels]
def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result
With open(‘iris.ctf’, ‘w’) as output_file:
for index in range(0, feature.shape[0]):
feature_values = ‘ ‘.join([str(x) for x in np.nditer(features[index])])
label_values = ‘ ‘.join([str(x) for x in np.nditer(labels[index])])
output_file.write(‘features {} | labels {} \n’.format(feature_values, label_values))

Feeding the data

一旦你创建了 MinibatchSource, 实例,我们需要对其进行训练。我们可以使用与在处理小型内存数据集时相同训练逻辑。这里,我们将使用 MinibatchSource 实例作为对损失函数进行 train 方法输入,如下所示 −

Once you create MinibatchSource, instance, we need to train it. We can use the same training logic as used when we worked with small in-memory datasets. Here, we will use MinibatchSource instance as the input for the train method on loss function as follows −

Implementation Example

Step 1 − 为了记录训练过程的输出,首先从 cntk.logging 模块导入 ProgressPrinter,如下所示 −

Step 1 − In order to log the output of the training session, first import the ProgressPrinter from cntk.logging module as follows −

from cntk.logging import ProgressPrinter

Step 2 − 接下来,为了设置训练,从 cntk.train 模块中导入 trainertraining_session 模块,如下所示 −

Step 2 − Next, to set up the training session, import the trainer and training_session from cntk.train module as follows −

from cntk.train import Trainer,

Step 3 − 现在,我们需要定义一些常量集,如 minibatch_sizesamples_per_epochnum_epochs ,如下所示 −

Step 3 − Now, we need to define some set of constants like minibatch_size, samples_per_epoch and num_epochs as follows −

minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30

Step 4 − 接下来,为了让 CNTK 了解如何在训练期间读取数据,我们需要定义网络的输入变量和微型批次源中的流之间的映射。

Step 4 − Next, in order to know CNTK how to read data during training, we need to define a mapping between the input variable for the network and the streams in the minibatch source.

input_map = {
     features: minibatch.source.streams.features,
     labels: minibatch.source.streams.features
}

Step 5 − 接下来,为了记录训练过程的输出,初始化 progress_printer 变量,并将其设置为一个新的 ProgressPrinter 实例,如下所示 −

Step 5 − Next, to log the output of the training process, initialise the progress_printer variable with a new ProgressPrinter instance as follows −

progress_writer = ProgressPrinter(0)

Step 6 − 最后,我们需要调用作为损失的 train 方法,如下所示 −

Step 6 − At last, we need to invoke the train method on the loss as follows −

train_history = loss.train(minibatch_source,
parameter_learners=[learner],
  model_inputs_to_streams=input_map,
callbacks=[progress_writer],
epoch_size=samples_per_epoch,
max_epochs=num_epochs)

Feeding the data - Complete implementation example

from cntk.logging import ProgressPrinter
from cntk.train import Trainer, training_session
minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30
input_map = {
   features: minibatch.source.streams.features,
   labels: minibatch.source.streams.features
}
progress_writer = ProgressPrinter(0)
train_history = loss.train(minibatch_source,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer],
epoch_size=samples_per_epoch,
max_epochs=num_epochs)

Output

-------------------------------------------------------------------
average   since   average   since  examples
loss      last     metric   last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.21      1.21      0        0       32
1.15      0.12      0        0       96
[………]

CNTK - Measuring Performance

本章将解释如何在 CNKT 中度量模型性能。

This chapter will explain how to measure the model performance in CNKT.

Strategy to validate model performance

在构建 ML 模型后,我们通常使用一组数据样本对它进行训练。由于这种训练,我们的 ML 模型得以学习并推导出一些一般规则。当我们向模型输入新的样本(即与训练时提供的样本不同的样本)时,ML 模型的性能至关重要。在这种情况下,该模型的行为不同。对于这些新样本,它在做出良好的预测方面可能表现不佳。

After building a ML model, we used to train it using a set of data samples. Because of this training our ML model learns and derive some general rules. The performance of ML model matters when we feed new samples, i.e., different samples than provided at the time of training, to the model. The model behaves differently in that case. It may be worse at making a good prediction on those new samples.

但是,该模型也必须对新样本表现良好,因为在生产环境中,我们将获得与用于训练目的的样本数据不同的输入。这就是我们应该使用与用于训练目的的样本不同的样本集来验证 ML 模型的原因。在这里,我们将讨论用于为神经网络创建数据集的两种不同的技术。

But the model must work well for new samples as well because in production environment we will get different input than we used sample data for training purpose. That’s the reason, we should validate the ML model by using a set of samples different from the samples we used for training purpose. Here, we are going to discuss two different techniques for creating a dataset for validating a NN.

Hold-out dataset

这是用于创建数据集以验证神经网络的最简单的方法之一。顾名思义,在此方法中,我们将保留一组用于训练的样本(例如 20%),并使用它来测试 ML 模型的性能。下图显示了训练样本和验证样本之间的比率 −

It is one of the easiest methods for creating a dataset to validate a NN. As name implies, in this method we will be holding back one set of samples from training (say 20%) and using it to test the performance of our ML model. Following diagram shows the ratio between training and validation samples −

dataset

保留数据集模型确保我们拥有足够的数据来训练我们的 ML 模型,同时我们将有合理数量的样本来获得模型性能的良好度量。

Hold-out dataset model ensures that we have enough data to train our ML model and at the same time we will have a reasonable number of samples to get good measurement of model’s performance.

为了包含在训练集中和测试集中,从主数据集中选择随机样本是一种良好的做法。它确保了训练集和测试集之间的平均分布。

In order to include in the training set and test set, it’s a good practice to choose random samples from the main dataset. It ensures an even distribution between training and test set.

下面是一个示例,我们在其中使用 train_test_split 函数从 scikit-learn 库中生成自己的保留数据集。

Following is an example in which we are producing own hold-out dataset by using train_test_split function from the scikit-learn library.

Example

from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# Here above test_size = 0.2 represents that we provided 20% of the data as test data.
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
classifier_knn = KNeighborsClassifier(n_neighbors=3)
classifier_knn.fit(X_train, y_train)
y_pred = classifier_knn.predict(X_test)
# Providing sample data and the model will make prediction out of that data
sample = [[5, 5, 3, 2], [2, 4, 3, 5]]
preds = classifier_knn.predict(sample)
pred_species = [iris.target_names[p] for p in preds] print("Predictions:", pred_species)

Output

Predictions: ['versicolor', 'virginica']

在我们使用 CNTK 时,我们在每次训练模型时都需要随机安排数据集的顺序,因为 −

While using CNTK, we need to randomise the order of our dataset each time we train our model because −

  1. Deep learning algorithms are highly influenced by the random-number generators.

  2. The order in which we provide the samples to NN during training greatly affects its performance.

使用保留数据集技术的主要缺点是它不可靠,因为有时我们会得到非常好的结果,但有时我们会得到较差的结果。

The major downside of using the hold-out dataset technique is that it is unreliable because sometimes we get very good results but sometimes, we get bad results.

K-fold cross validation

为了使我们的 ML 模型更可靠,有一种称为 K 折叠交叉验证的技术。在本质上,K 折叠交叉验证技术与前一种技术相同,但它会重复多次 —— 通常是 5 到 10 次。下图表示其概念 −

To make our ML model more reliable, there is a technique called K-fold cross validation. In nature K-fold cross validation technique is same as the previous technique, but it repeats it several times-usually about 5 to 10 times. Following diagram represents its concept −

k fold cross validation

Working of K-fold cross validation

K 折叠交叉验证的工作原理可以通过以下步骤理解 −

The working of K-fold cross validation can be understood with the help of following steps −

Step 1 —— 与保留数据集技术类似,在 K 折叠交叉验证技术中,我们首先需要将数据集拆分为训练集和测试集。理想情况下,比率为 80-20,即 80% 的训练集和 20% 的测试集。

Step 1 − Like in Hand-out dataset technique, in K-fold cross validation technique, first we need to split the dataset into a training and test set. Ideally, the ratio is 80-20, i.e. 80% of training set and 20% of test set.

Step 2 —— 接下来,我们需要使用训练集来训练我们的模型。

Step 2 − Next, we need to train our model using the training set.

Step 3 —— 最后,我们将使用测试集来衡量我们模型的性能。保留数据集技术和 K 交叉验证技术之间的唯一区别在于,上述过程通常会重复 5 到 10 次,最后计算所有性能指标的平均值。该平均值将是最终的性能指标。

Step 3 −At last, we will be using the test set to measure the performance of our model. The only difference between Hold-out dataset technique and k-cross validation technique is that the above process gets repeated usually for 5 to 10 times and at the end the average is calculated over all the performance metrics. That average would be the final performance metrics.

让我们看一个小数据,集的示例 −

Let us see an example with a small dataset −

Example

from numpy import array
from sklearn.model_selection import KFold
data = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
kfold = KFold(5, True, 1)
for train, test in kfold.split(data):
   print('train: %s, test: %s' % (data[train],(data[test]))

Output

train: [0.1 0.2 0.4 0.5 0.6 0.7 0.8 0.9], test: [0.3 1. ]
train: [0.1 0.2 0.3 0.4 0.6 0.8 0.9 1. ], test: [0.5 0.7]
train: [0.2 0.3 0.5 0.6 0.7 0.8 0.9 1. ], test: [0.1 0.4]
train: [0.1 0.3 0.4 0.5 0.6 0.7 0.9 1. ], test: [0.2 0.8]
train: [0.1 0.2 0.3 0.4 0.5 0.7 0.8 1. ], test: [0.6 0.9]

如我们所见,由于使用了更贴近现实的训练和测试方案,K 折叠交叉验证技术为我们提供了更稳定的性能度量,但在缺点方面,在验证深度学习模型时需要花费大量时间。

As we see, because of using a more realistic training and test scenario, k-fold cross validation technique gives us a much more stable performance measurement but, on the downside, it takes a lot of time when validating deep learning models.

CNTK 不支持 K 交叉验证,因此我们需要编写自己的脚本来完成此操作。

CNTK does not support for k-cross validation, hence we need to write our own script to do so.

Detecting underfitting and overfitting

无论我们使用保留数据集还是 K 折叠交叉验证技术,我们都会发现指标的输出对于用于训练的数据集和用于验证的数据集将不同。

Whether, we use Hand-out dataset or k-fold cross-validation technique, we will discover that the output for the metrics will be different for dataset used for training and the dataset used for validation.

Detecting overfitting

过度拟合,简称过拟,是机器学习模型对训练数据进行极佳建模,但在测试数据上表现不佳,即无法预测测试数据。

The phenomenon called overfitting is a situation where our ML model, models the training data exceptionally well, but fails to perform well on the testing data, i.e. was not able to predict test data.

当机器学习模型在训练数据中学习具体模式和噪声时,就会发生这种现象,以至于它对模型从训练数据泛化到新数据(即未见过的数据)的能力产生了负面影响。此处,噪声是数据集中无关的信息或随机性。

It happens when a ML model learns a specific pattern and noise from the training data to such an extent, that it negatively impacts that model’s ability to generalise from the training data to new, i.e. unseen data. Here, noise is the irrelevant information or randomness in a dataset.

以下两种方法可以帮助我们检测模型是否过拟合:

Following are the two ways with the help of which we can detect weather our model is overfit or not −

  1. The overfit model will perform well on the same samples we used for training, but it will perform very bad on the new samples, i.e. samples different from training.

  2. The model is overfit during validation if the metric on the test set is lower than the same metric, we use on our training set.

Detecting underfitting

机器学习中可能出现的另一种情况是欠拟合。在欠拟合中,我们的机器学习模型并未很好地对训练数据建模,并且无法预测有用的输出。当开始训练第一个时期时,我们的模型将欠拟合,但随着训练的进行,欠拟合将减少。

Another situation that can arise in our ML is underfitting. This is a situation where, our ML model didn’t model the training data well and fails to predict useful output. When we start training the first epoch, our model will be underfitting, but will become less underfit as training progress.

检测模型是否欠拟合的方法之一是查看训练集和测试集的指标。如果测试集上的度量值高于训练集上的度量值,则我们的模型将欠拟合。

One of the ways to detect, whether our model is underfit or not is to look at the metrics for training set and test set. Our model will be underfit if the metric on the test set is higher than the metric on the training set.

CNTK - Neural Network Classification

在本章节中,我们将学习如何使用 CNTK 对神经网络进行分类。

In this chapter, we will study how to classify neural network by using CNTK.

Introduction

分类可以定义为预测给定输入数据的分类输出标签或响应的过程。分类输出基于模型在训练阶段学到的内容,可以是“黑色”或“白色”或“垃圾邮件”或“非垃圾邮件”等形式。

Classification may be defined as the process to predict categorial output labels or responses for the given input data. The categorised output, which will be based on what the model has learned in training phase, can have the form such as "Black" or "White" or "spam" or "no spam".

另一方面,在数学上,它是近似映射函数的任务,例如从输入变量,比如 X,到输出变量,比如 Y。

On the other hand, mathematically, it is the task of approximating a mapping function say f from input variables say X to the output variables say Y.

分类问题的一个经典示例可以是电子邮件中的垃圾邮件检测。显然,输出只有两类,“垃圾邮件”和“非垃圾邮件”。

A classic example of classification problem can be the spam detection in e-mails. It is obvious that there can be only two categories of output, "spam" and "no spam".

要实施这种分类,我们首先需要对分类器进行训练,其中“垃圾邮件”和“非垃圾邮件”电子邮件将用作训练数据。一旦分类器训练成功,就可以用它来检测未知电子邮件。

To implement such classification, we first need to do training of the classifier where "spam" and "no spam" emails would be used as the training data. Once, the classifier trained successfully, it can be used to detect an unknown email.

在此,我们使用具有以下特征的鸢尾花数据集创建一个 4-5-3 神经网络:

Here, we are going to create a 4-5-3 NN using iris flower dataset having the following −

  1. 4-input nodes (one for each predictor value).

  2. 5-hidden processing nodes.

  3. 3-output nodes (because there are three possible species in iris dataset).

Loading Dataset

我们将使用鸢尾花数据集,从中我们希望基于萼片宽度和长度以及花瓣宽度和长度对鸢尾花的种类进行分类。数据集描述了不同品种鸢尾花的物理特性:

We will be using iris flower dataset, from which we want to classify species of iris flowers based on the physical properties of sepal width and length, and petal width and length. The dataset describes the physical properties of different varieties of iris flowers −

  1. Sepal length

  2. Sepal width

  3. Petal length

  4. Petal width

  5. Class i.e. iris setosa or iris versicolor or iris virginica

我们有 iris.CSV 文件,我们之前也曾在章节中使用过它。可以使用 Pandas 库加载它。但是,在将它用于分类器或为分类器加载它之前,我们需要准备训练和测试文件,这样才可以在 CNTK 中轻松使用它。

We have iris.CSV file which we used before in previous chapters also. It can be loaded with the help of Pandas library. But, before using it or loading it for our classifier, we need to prepare the training and test files, so that it can be used easily with CNTK.

Preparing training & test files

鸢尾花数据集是机器学习项目中最流行的数据集之一。它有 150 个数据项,原始数据如下所示:

Iris dataset is one of the most popular datasets for ML projects. It has 150 data items and the raw data looks as follows −

5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
…
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
…
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica

如前所述,每行上的前四个值描述了不同变种的物理性质,即鸢尾花的萼片长度、萼片宽度、花瓣长度、花瓣宽度。

As told earlier, the first four values on each line describes the physical properties of different varieties, i.e. Sepal length, Sepal width, Petal length, Petal width of iris flowers.

然而,我们必须将数据转换为 CNTK 可以轻松使用的格式,该格式为 .ctf 文件(我们还在上一部分创建了 iris.ctf)。它的外观如下 −

But, we should have to convert the data in the format, that can be easily used by CNTK and that format is .ctf file (we created one iris.ctf in previous section also). It will look like as follows −

|attribs 5.1 3.5 1.4 0.2|species 1 0 0
|attribs 4.9 3.0 1.4 0.2|species 1 0 0
…
|attribs 7.0 3.2 4.7 1.4|species 0 1 0
|attribs 6.4 3.2 4.5 1.5|species 0 1 0
…
|attribs 6.3 3.3 6.0 2.5|species 0 0 1
|attribs 5.8 2.7 5.1 1.9|species 0 0 1

在以上数据中,|attribs 标记标记特征值的开头,|species 标记标记类标签值。我们也可以使用任何其他我们想要的标记名称,甚至还可以添加项目 ID。例如,查看以下数据 −

In the above data, the |attribs tag mark the start of the feature value and the |species tags the class label values. We can also use any other tag names of our wish, even we can add item ID as well. For example, look at the following data −

|ID 001 |attribs 5.1 3.5 1.4 0.2|species 1 0 0 |#setosa
|ID 002 |attribs 4.9 3.0 1.4 0.2|species 1 0 0 |#setosa
…
|ID 051 |attribs 7.0 3.2 4.7 1.4|species 0 1 0 |#versicolor
|ID 052 |attribs 6.4 3.2 4.5 1.5|species 0 1 0 |#versicolor
…

iris 数据集中共有 150 个数据项,对于此示例,我们将使用 80-20 保留数据集规则,即 80%(120 个项目)数据项用于训练目的,其余 20%(30 个项目)数据项用于测试目的。

There are total 150 data items in iris dataset and for this example, we will be using 80-20 hold-out dataset rule i.e. 80% (120 items) data items for training purpose and remaining 20% (30 items) data items for testing purpose.

Constructing Classification model

首先,我们需要使用 CNTK 格式处理数据文件,为此,我们将使用名为 create_reader 的帮助程序函数,如下 −

First, we need to process the data files in CNTK format and for that we are going to use the helper function named create_reader as follows −

def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='attribs', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='species', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src

现在,我们需要为我们的 NN 设置架构参数,还要提供数据文件的位置。这可以通过以下 Python 代码来完成 −

Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code −

def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 5
output_dim = 3
train_file = ".\\...\\" #provide the name of the training file(120 data items)
test_file = ".\\...\\" #provide the name of the test file(30 data items)

现在,我们的程序将在以下代码行的帮助下创建未训练的 NN −

Now, with the help of following code line our program will create the untrained NN −

X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
nnet = oLayer
model = C.ops.softmax(nnet)

现在,一旦我们创建了未训练的双模型,我们就需要设置一个 Learner 算法对象,然后使用它创建一个 Trainer 训练对象。我们将使用 SGD 学习器和 cross_entropy_with_softmax 损失函数 −

Now, once we created the dual untrained model, we need to set up a Learner algorithm object and afterwards use it to create a Trainer training object. We are going to use SGD learner and cross_entropy_with_softmax loss function −

tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
max_iter = 2000
batch_size = 10
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])

将学习算法编码如下 −

Code the learning algorithm as follows −

max_iter = 2000
batch_size = 10
lr_schedule = C.learning_parameter_schedule_per_sample([(1000, 0.05), (1, 0.01)])
mom_sch = C.momentum_schedule([(100, 0.99), (0, 0.95)], batch_size)
learner = C.fsadagrad(nnet.parameters, lr=lr_schedule, momentum=mom_sch)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])

现在,一旦我们完成了 Trainer 对象,我们就需要创建阅读器函数来读取训练数据 −

Now, once we finished with Trainer object, we need to create a reader function to read the training data−

rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
iris_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }

现在是时候训练我们的 NN 模型了 −

Now it’s time to train our NN model−

for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 500 == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))

训练完成后,让我们使用测试数据项评估模型 −

Once, we have done with training, let’s evaluate the model using test data items −

print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
iris_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 30
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100
print("Classification accuracy = %0.2f%%" % acc)

评估了我们训练的 NN 模型的准确性后,我们将使用它对未见数据进行预测 −

After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data −

np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[6.4, 3.2, 4.5, 1.5]], dtype=np.float32)
print("\nPredicting Iris species for input features: ")
print(unknown[0]) pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities are: ")
print(pred_prob[0])

Complete Classification Model

Import numpy as np
Import cntk as C
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='attribs', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='species', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 5
output_dim = 3
train_file = ".\\...\\" #provide the name of the training file(120 data items)
test_file = ".\\...\\" #provide the name of the test file(30 data items)
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
nnet = oLayer
model = C.ops.softmax(nnet)
tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
max_iter = 2000
batch_size = 10
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
max_iter = 2000
batch_size = 10
lr_schedule = C.learning_parameter_schedule_per_sample([(1000, 0.05), (1, 0.01)])
mom_sch = C.momentum_schedule([(100, 0.99), (0, 0.95)], batch_size)
learner = C.fsadagrad(nnet.parameters, lr=lr_schedule, momentum=mom_sch)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
iris_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 500 == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
iris_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 30
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100
print("Classification accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[7.0, 3.2, 4.7, 1.4]], dtype=np.float32)
print("\nPredicting species for input features: ")
print(unknown[0])
pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities: ")
print(pred_prob[0])
if __name__== ”__main__”:
main()

Output

Using CNTK version = 2.7
batch 0: mean loss = 1.0986, mean accuracy = 40.00%
batch 500: mean loss = 0.6677, mean accuracy = 80.00%
batch 1000: mean loss = 0.5332, mean accuracy = 70.00%
batch 1500: mean loss = 0.2408, mean accuracy = 100.00%
Evaluating test data
Classification accuracy = 94.58%
Predicting species for input features:
[7.0 3.2 4.7 1.4]
Prediction probabilities:
[0.0847 0.736 0.113]

Saving the trained model

此 Iris 数据集仅有 150 个数据项,因此训练 NN 分类器模型只需要几秒钟,但训练包含数千或数百万个数据项的大型数据集可能需要数小时甚至数天。

This Iris dataset has only 150 data items, hence it would take only a few seconds to train the NN classifier model, but training on a large dataset having hundred or thousand data items can take hours or even days.

我们可以保存我们的模型,这样我们就不用从头开始重新训练它。借助以下 Python 代码,我们可以保存我们的训练的 NN −

We can save our model so that, we won’t have to retain it from scratch. With the help of following Python code, we can save our trained NN −

nn_classifier = “.\\neuralclassifier.model” #provide the name of the file
model.save(nn_classifier, format=C.ModelFormat.CNTKv2)

以下是上面使用的 save() 函数的参数 −

Following are the arguments of save() function used above −

  1. File name is the first argument of save() function. It can also be write along with the path of file.

  2. Another parameter is the format parameter which has a default value C.ModelFormat.CNTKv2.

Loading the trained model

保存训练后的模型后,加载该模型非常容易。我们只需要使用 load () 函数。让我们在以下示例中检查此事 −

Once you saved the trained model, it’s very easy to load that model. We only need to use the load () function. Let’s check this in the following example −

import numpy as np
import cntk as C
model = C.ops.functions.Function.load(“.\\neuralclassifier.model”)
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[7.0, 3.2, 4.7, 1.4]], dtype=np.float32)
print("\nPredicting species for input features: ")
print(unknown[0])
pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities: ")
print(pred_prob[0])

保存后的模型的好处是,一旦加载保存的模型,就可以像刚训练过模型一样使用它。

The benefit of saved model is that, once you load a saved model, it can be used exactly as if the model had just been trained.

CNTK - Neural Network Binary Classification

在本章中,让我们了解什么是神经网络二元分类以及如何使用 CNTK.

Let us understand, what is neural network binary classification using CNTK, in this chapter.

使用神经网络进行二元分类类似于多类别分类,不同之处在于只有两个输出节点,而不是三个或更多。在此,我们将使用两种技术使用神经网络执行二元分类,即单节点技术和双节点技术。单节点技术比双节点技术更常见。

Binary classification using NN is like multi-class classification, the only thing is that there are just two output nodes instead of three or more. Here, we are going to perform binary classification using a neural network by using two techniques namely one-node and two-node technique. One-node technique is more common than two-node technique.

Loading Dataset

对于这两种 NN 实现技术,我们将使用银行纸币数据集。该数据集可以从 UCI 机器学习存储库下载,网址为 https://archive.ics.uci.edu/ml/datasets/banknote+authentication.

For both these techniques to implement using NN, we will be using banknote dataset. The dataset can be downloaded from UCI Machine Learning Repository which is available at https://archive.ics.uci.edu/ml/datasets/banknote+authentication.

在我们的示例中,我们将使用 50 个真实的项目,其伪造类别=0,以及前 50 个伪造项目,其伪造类别=1。

For our example, we will be using 50 authentic data items having class forgery = 0, and the first 50 fake items having class forgery = 1.

Preparing training & test files

总数据集中有 1372 个项目。原始数据集如下所示 −

There are 1372 data items in the full dataset. The raw dataset looks as follows −

3.6216, 8.6661, -2.8076, -0.44699, 0
4.5459, 8.1674, -2.4586, -1.4621, 0
…
-1.3971, 3.3191, -1.3927, -1.9948, 1
0.39012, -0.14279, -0.031994, 0.35084, 1

现在,我们首先需要将此原始数据转换到双节点 CNTK 格式,如下所示 −

Now, first we need to convert this raw data into two-node CNTK format, which would be as follows −

|stats 3.62160000 8.66610000 -2.80730000 -0.44699000 |forgery 0 1 |# authentic
|stats 4.54590000 8.16740000 -2.45860000 -1.46210000 |forgery 0 1 |# authentic
. . .
|stats -1.39710000 3.31910000 -1.39270000 -1.99480000 |forgery 1 0 |# fake
|stats 0.39012000 -0.14279000 -0.03199400 0.35084000 |forgery 1 0 |# fake

您可以使用以下 python 程序从原始数据创建 CNTK 格式的数据 −

You can use the following python program to create CNTK-format data from Raw data −

fin = open(".\\...", "r") #provide the location of saved dataset text file.
for line in fin:
   line = line.strip()
   tokens = line.split(",")
   if tokens[4] == "0":
    print("|stats %12.8f %12.8f %12.8f %12.8f |forgery 0 1 |# authentic" % \
(float(tokens[0]), float(tokens[1]), float(tokens[2]), float(tokens[3])) )
   else:
    print("|stats %12.8f %12.8f %12.8f %12.8f |forgery 1 0 |# fake" % \
(float(tokens[0]), float(tokens[1]), float(tokens[2]), float(tokens[3])) )
fin.close()

Two-node binary Classification model

双节点分类和多类别分类之间几乎没有区别。在此,我们首先需要以 CNTK 格式处理数据文件,为此,我们将使用如下所示的辅助函数 create_reader

There is very little difference between two-node classification and multi-class classification. Here we first, need to process the data files in CNTK format and for that we are going to use the helper function named create_reader as follows −

def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src

现在,我们需要为我们的 NN 设置架构参数,还要提供数据文件的位置。这可以通过以下 Python 代码来完成 −

Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code −

def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 10
output_dim = 2
train_file = ".\\...\\" #provide the name of the training file
test_file = ".\\...\\" #provide the name of the test file

现在,我们的程序将在以下代码行的帮助下创建未训练的 NN −

Now, with the help of following code line our program will create the untrained NN −

X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
nnet = oLayer
model = C.ops.softmax(nnet)

现在,一旦我们创建了双重未训练过的模型,我们就需要设置一个 Learner 算法对象,然后用它创建一个 Trainer 训练对象。我们将使用 SGD 学习器和 cross_entropy_with_softmax 损失函数 −

Now, once we created the dual untrained model, we need to set up a Learner algorithm object and afterwards use it to create a Trainer training object. We are going to use SGD learner and cross_entropy_with_softmax loss function −

tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
max_iter = 500
batch_size = 10
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])

现在,一旦我们完成了 Trainer 对象,我们就需要创建一个读取器函数来读取训练数据 −

Now, once we finished with Trainer object, we need to create a reader function to read the training data −

rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }

现在,是时候训练我们的 NN 模型了 −

Now, it is time to train our NN model −

for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 500 == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))

一旦训练完成,让我们使用测试数据项目评估模型 −

Once training is completed, let us evaluate the model using test data items −

print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100
print("Classification accuracy = %0.2f%%" % acc)

评估了我们训练的 NN 模型的准确性后,我们将使用它对未见数据进行预测 −

After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data −

np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32)
print("\nPredicting Banknote authenticity for input features: ")
print(unknown[0])
pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities are: ")
print(pred_prob[0])
if pred_prob[0,0] < pred_prob[0,1]:
  print(“Prediction: authentic”)
else:
  print(“Prediction: fake”)

Complete Two-node Classification Model

def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 10
output_dim = 2
train_file = ".\\...\\" #provide the name of the training file
test_file = ".\\...\\" #provide the name of the test file
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
withC.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
nnet = oLayer
model = C.ops.softmax(nnet)
tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
max_iter = 500
batch_size = 10
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 500 == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100
print("Classification accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32)
print("\nPredicting Banknote authenticity for input features: ")
print(unknown[0])
pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities are: ")
print(pred_prob[0])
if pred_prob[0,0] < pred_prob[0,1]:
print(“Prediction: authentic”)
else:
print(“Prediction: fake”)
if __name__== ”__main__”:
main()

Output

Using CNTK version = 2.7
batch 0: mean loss = 0.6928, accuracy = 80.00%
batch 50: mean loss = 0.6877, accuracy = 70.00%
batch 100: mean loss = 0.6432, accuracy = 80.00%
batch 150: mean loss = 0.4978, accuracy = 80.00%
batch 200: mean loss = 0.4551, accuracy = 90.00%
batch 250: mean loss = 0.3755, accuracy = 90.00%
batch 300: mean loss = 0.2295, accuracy = 100.00%
batch 350: mean loss = 0.1542, accuracy = 100.00%
batch 400: mean loss = 0.1581, accuracy = 100.00%
batch 450: mean loss = 0.1499, accuracy = 100.00%
Evaluating test data
Classification accuracy = 84.58%
Predicting banknote authenticity for input features:
[0.6 1.9 -3.3 -0.3]
Prediction probabilities are:
[0.7847 0.2536]
Prediction: fake

One-node binary Classification model

实现程序几乎与我们上面为双节点分类所做的相同。主要的变化是使用双节点分类技术时。

The implementation program is almost like we have done above for two-node classification. The main change is that when using the two-node classification technique.

我们可以使用 CNTK 内置的 classification_error() 函数,但在单节点分类的情况下,CNTK 不支持 classification_error() 函数。这就是我们需要按如下方法实现程序定义函数的原因 −

We can use the CNTK built-in classification_error() function, but in case of one-node classification CNTK doesn’t support classification_error() function. That’s the reason we need to implement a program-defined function as follows −

def class_acc(mb, x_var, y_var, model):
num_correct = 0; num_wrong = 0
x_mat = mb[x_var].asarray()
y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
   p = model.eval(x_mat[i]
   y = y_mat[i]
   if p[0,0] < 0.5 and y[0,0] == 0.0 or p[0,0] >= 0.5 and y[0,0] == 1.0:
num_correct += 1
 else:
  num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)

有了这个改变,让我们看看完整的单节点分类示例 −

With that change let’s see the complete one-node classification example −

Complete one-node Classification Model

import numpy as np
import cntk as C
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def class_acc(mb, x_var, y_var, model):
num_correct = 0; num_wrong = 0
x_mat = mb[x_var].asarray()
y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
  p = model.eval(x_mat[i]
  y = y_mat[i]
  if p[0,0] < 0.5 and y[0,0] == 0.0 or p[0,0] >= 0.5 and y[0,0] == 1.0:
  num_correct += 1
 else:
  num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 10
output_dim = 1
train_file = ".\\...\\" #provide the name of the training file
test_file = ".\\...\\" #provide the name of the test file
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
model = oLayer
tr_loss = C.cross_entropy_with_softmax(model, Y)
max_iter = 1000
batch_size = 10
learn_rate = 0.01
learner = C.sgd(model.parameters, learn_rate)
trainer = C.Trainer(model, (tr_loss), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
banknote_input_map = {X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 100 == 0:
mcee=trainer.previous_minibatch_loss_average
ca = class_acc(curr_batch, X,Y, model)
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, ca))
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map)
acc = class_acc(all_test, X,Y, model)
print("Classification accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32)
print("\nPredicting Banknote authenticity for input features: ")
print(unknown[0])
pred_prob = model.eval({X:unknown})
print("Prediction probability: ")
print(“%0.4f” % pred_prob[0,0])
if pred_prob[0,0] < 0.5:
  print(“Prediction: authentic”)
else:
  print(“Prediction: fake”)
if __name__== ”__main__”:
   main()

Output

Using CNTK version = 2.7
batch 0: mean loss = 0.6936, accuracy = 10.00%
batch 100: mean loss = 0.6882, accuracy = 70.00%
batch 200: mean loss = 0.6597, accuracy = 50.00%
batch 300: mean loss = 0.5298, accuracy = 70.00%
batch 400: mean loss = 0.4090, accuracy = 100.00%
batch 500: mean loss = 0.3790, accuracy = 90.00%
batch 600: mean loss = 0.1852, accuracy = 100.00%
batch 700: mean loss = 0.1135, accuracy = 100.00%
batch 800: mean loss = 0.1285, accuracy = 100.00%
batch 900: mean loss = 0.1054, accuracy = 100.00%
Evaluating test data
Classification accuracy = 84.00%
Predicting banknote authenticity for input features:
[0.6 1.9 -3.3 -0.3]
Prediction probability:
0.8846
Prediction: fake

CNTK - Neural Network Regression

本章将帮助您了解神经网络回归与 CNTK 的关系。

The chapter will help you understand the neural network regression with regards to CNTK.

Introduction

正如我们所知,为了从一个或多个预测变量中预测一个数字值,我们使用回归。让我们举一个预测 100 个城镇中某个城镇的房价中位数的示例。为此,我们有包括以下内容的数据 −

As we know that, in order to predict a numeric value from one or more predictor variables, we use regression. Let’s take an example of predicting the median value of a house in say one of the 100 towns. To do so, we have data that includes −

  1. A crime statistic for each town.

  2. The age of the houses in each town.

  3. A measure of the distance from each town to a prime location.

  4. The student-to-teacher ratio in each town.

  5. A racial demographic statistic for each town.

  6. The median house value in each town.

根据这五个预测变量,我们想预测房屋价值中位数。为此,我们可以构建如下形式的线性回归模型−

Based on these five predictor variables, we would like to predict median house value. And for this we can create a linear regression model along the lines of−

Y = a0+a1(crime)+a2(house-age)+(a3)(distance)+(a4)(ratio)+(a5)(racial)

在上述方程式中−

In the above equation −

Y 是预测的房屋价值中位数

Y is a predicted median value

*a*0 是一个常数且

*a*0 is a constant and

*a*1 到 *a*5 都是与我们上面讨论的五个预测变量相关联的常数。

*a*1 through *a*5 all are constants associated with the five predictors we discussed above.

我们还有另一种方法是使用神经网络。它将创建更准确的预测模型。

We also have an alternate approach of using a neural network. It will create more accurate prediction model.

这里,我们将使用 CNTK 创建神经网络回归模型。

Here, we will be creating a neural network regression model by using CNTK.

Loading Dataset

为实现使用 CNTK 的神经网络回归,我们将使用波士顿地区房屋价值数据集。该数据集可从 UCI 机器学习资料库下载,网址为 https://archive.ics.uci.edu/ 。此数据集共有 14 个变量和 506 个实例。

To implement Neural Network regression using CNTK, we will be using Boston area house values dataset. The dataset can be downloaded from UCI Machine Learning Repository which is available at https://archive.ics.uci.edu/. This dataset has total 14 variables and 506 instances.

但对于我们的实现程序,我们将使用 14 个变量中的 6 个和 100 个实例。在 6 个变量中,5 个作为预测变量,1 个作为待预测值。在 100 个实例中,我们将使用 80 个进行训练,20 个进行测试目的。我们要预测的值是城镇的房屋价格中位数。我们来看看我们将使用的五个预测变量−

But, for our implementation program we are going to use six of the 14 variables and 100 instances. Out of 6, 5 as predictors and one as a value-to-predict. From 100 instances, we will be using 80 for training and 20 for testing purpose. The value which we want to predict is the median house price in a town. Let’s see the five predictors we will be using −

  1. * Crime per capita in the town* − We would expect smaller values to be associated with this predictor.

  2. * Proportion of owner* − occupied units built before 1940 - We would expect smaller values to be associated with this predictor because larger value means older house.

  3. * Weighed distance of the town to five Boston employment centers.*

  4. * Area school pupil-to-teacher ratio.*

  5. * An indirect metric of the proportion of black residents in the town.*

Preparing training & test files

像我们之前所做的那样,我们首先需要将原始数据转换为 CNTK 格式。我们准备使用前 80 个数据项进行训练目的,因此,基于制表符分隔的 CNTK 格式如下−

As we did before, first we need to convert the raw data into CNTK format. We are going to use first 80 data items for training purpose, so the tab-delimited CNTK format is as follows −

|predictors 1.612820 96.90 3.76 21.00 248.31 |medval 13.50
|predictors 0.064170 68.20 3.36 19.20 396.90 |medval 18.90
|predictors 0.097440 61.40 3.38 19.20 377.56 |medval 20.00
. . .

将接下来的 20 个项目也转换成 CNTK 格式,将用于测试目的。

Next 20 items, also converted into CNTK format, will used for testing purpose.

Constructing Regression model

首先,我们需要处理 CNTK 格式的数据文件,为此,我们将使用 create_reader 名为辅助函数,如下所示 −

First, we need to process the data files in CNTK format and for that, we are going to use the helper function named create_reader as follows −

def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='predictors', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='medval', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src

其次,我们需要创建一个辅助函数,该函数接受一个 CNTK 迷你批处理对象并计算一个自定义准确率度量。

Next, we need to create a helper function that accepts a CNTK mini-batch object and computes a custom accuracy metric.

def mb_accuracy(mb, x_var, y_var, model, delta):
   num_correct = 0
   num_wrong = 0
   x_mat = mb[x_var].asarray()
   y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
  v = model.eval(x_mat[i])
  y = y_mat[i]
if np.abs(v[0,0] – y[0,0]) < delta:
   num_correct += 1
else:
   num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)

现在,我们需要为我们的 NN 设置架构参数,还要提供数据文件的位置。这可以通过以下 Python 代码来完成 −

Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code −

def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 5
hidden_dim = 20
output_dim = 1
train_file = ".\\...\\" #provide the name of the training file(80 data items)
test_file = ".\\...\\" #provide the name of the test file(20 data items)

现在,我们的程序将在以下代码行的帮助下创建未训练的 NN −

Now, with the help of following code line our program will create the untrained NN −

X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
model = C.ops.alias(oLayer)

现在,一旦我们创建了双重未训练模型,我们需要设置一个学习算法对象。我们将使用 SGD 学习器和 squared_error 损失函数 −

Now, once we have created the dual untrained model, we need to set up a Learner algorithm object. We are going to use SGD learner and squared_error loss function −

tr_loss = C.squared_error(model, Y)
max_iter = 3000
batch_size = 5
base_learn_rate = 0.02
sch=C.learning_parameter_schedule([base_learn_rate, base_learn_rate/2], minibatch_size=batch_size, epoch_size=int((max_iter*batch_size)/2))
learner = C.sgd(model.parameters, sch)
trainer = C.Trainer(model, (tr_loss), [learner])

现在,一旦我们完成学习算法对象,我们需要创建一个读取函数来读取训练数据 −

Now, once we finish with Learning algorithm object, we need to create a reader function to read the training data −

rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }

现在,是时候训练我们的 NN 模型了 −

Now, it’s time to train our NN model −

for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=boston_input_map) trainer.train_minibatch(curr_batch)
if i % int(max_iter/10) == 0:
mcee = trainer.previous_minibatch_loss_average
acc = mb_accuracy(curr_batch, X, Y, model, delta=3.00)
print("batch %4d: mean squared error = %8.4f, accuracy = %5.2f%% " \ % (i, mcee, acc))

一旦我们完成了训练,让我们使用测试数据项对模型进行评估 −

Once we have done with training, let’s evaluate the model using test data items −

print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=boston_input_map)
acc = mb_accuracy(all_test, X, Y, model, delta=3.00)
print("Prediction accuracy = %0.2f%%" % acc)

评估了我们训练的 NN 模型的准确性后,我们将使用它对未见数据进行预测 −

After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data −

np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])

Complete Regression Model

import numpy as np
import cntk as C
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='predictors', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='medval', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def mb_accuracy(mb, x_var, y_var, model, delta):
num_correct = 0
num_wrong = 0
x_mat = mb[x_var].asarray()
y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
   v = model.eval(x_mat[i])
   y = y_mat[i]
if np.abs(v[0,0] – y[0,0]) < delta:
   num_correct += 1
else:
   num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 5
hidden_dim = 20
output_dim = 1
train_file = ".\\...\\" #provide the name of the training file(80 data items)
test_file = ".\\...\\" #provide the name of the test file(20 data items)
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
model = C.ops.alias(oLayer)
tr_loss = C.squared_error(model, Y)
max_iter = 3000
batch_size = 5
base_learn_rate = 0.02
sch = C.learning_parameter_schedule([base_learn_rate, base_learn_rate/2], minibatch_size=batch_size, epoch_size=int((max_iter*batch_size)/2))
learner = C.sgd(model.parameters, sch)
trainer = C.Trainer(model, (tr_loss), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=boston_input_map) trainer.train_minibatch(curr_batch)
if i % int(max_iter/10) == 0:
   mcee = trainer.previous_minibatch_loss_average
   acc = mb_accuracy(curr_batch, X, Y, model, delta=3.00)
   print("batch %4d: mean squared error = %8.4f, accuracy = %5.2f%% " \ % (i, mcee, acc))
   print("\nEvaluating test data \n")
   rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
   boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
   num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=boston_input_map)
acc = mb_accuracy(all_test, X, Y, model, delta=3.00)
print("Prediction accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])
if __name__== ”__main__”:
   main()

Output

Using CNTK version = 2.7
batch 0: mean squared error = 385.6727, accuracy = 0.00%
batch 300: mean squared error = 41.6229, accuracy = 20.00%
batch 600: mean squared error = 28.7667, accuracy = 40.00%
batch 900: mean squared error = 48.6435, accuracy = 40.00%
batch 1200: mean squared error = 77.9562, accuracy = 80.00%
batch 1500: mean squared error = 7.8342, accuracy = 60.00%
batch 1800: mean squared error = 47.7062, accuracy = 60.00%
batch 2100: mean squared error = 40.5068, accuracy = 40.00%
batch 2400: mean squared error = 46.5023, accuracy = 40.00%
batch 2700: mean squared error = 15.6235, accuracy = 60.00%
Evaluating test data
Prediction accuracy = 64.00%
Predicting median home value for feature/predictor values:
[0.09 50. 4.5 17. 350.]
Predicted value is:
$21.02(x1000)

Saving the trained model

此波士顿房屋价值数据集仅有 506 个数据项(其中我们仅使用了 100 个)。因此,训练 NN 回归模型只需几秒钟,但在一个拥有数百或数千个数据项的大型数据集上进行训练可能需要数小时甚至数天。

This Boston Home value dataset has only 506 data items (among which we sued only 100). Hence, it would take only a few seconds to train the NN regressor model, but training on a large dataset having hundred or thousand data items can take hours or even days.

我们可以保存我们的模型,这样我们就不必从头开始保留它。在以下 Python 代码的帮助下,我们可以保存我们训练后的 NN −

We can save our model, so that we won’t have to retain it from scratch. With the help of following Python code, we can save our trained NN −

nn_regressor = “.\\neuralregressor.model” #provide the name of the file
model.save(nn_regressor, format=C.ModelFormat.CNTKv2)

以下是上面使用的 save() 函数的参数 −

Following are the arguments of save() function used above −

  1. File name is the first argument of save() function. It can also be written along with the path of file.

  2. Another parameter is the format parameter which has a default value C.ModelFormat.CNTKv2.

Loading the trained model

一旦您保存了训练后的模型,加载该模型就非常容易了。我们只需要使用 load () 函数。让我们在以下示例中检查这一点 −

Once you saved the trained model, it’s very easy to load that model. We only need to use the load () function. Let’s check this in following example −

import numpy as np
import cntk as C
model = C.ops.functions.Function.load(“.\\neuralregressor.model”)
np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting area median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])

保存模型的好处是,一旦您加载保存的模型,就可以像模型刚刚训练过一样使用它。

The benefit of saved model is that once you load a saved model, it can be used exactly as if the model had just been trained.

CNTK - Classification Model

本章将帮助您了解如何衡量 CNTK 中分类模型的性能。让我们从混淆矩阵开始。

This chapter will help you to understand how to measure performance of classification model in CNTK. Let us begin with confusion matrix.

Confusion matrix

混淆矩阵 - 一个表,预测输出与预期输出相对比,这是衡量分类问题性能最简单的方法,其中输出可以是两种或更多类型的类别。

Confusion matrix - a table with the predicted output versus the expected output is the easiest way to measure the performance of a classification problem, where the output can be of two or more type of classes.

为了了解它的工作原理,我们将创建一个用于二进制分类模型的混淆矩阵,该模型预测信用卡交易是正常的还是欺诈的。它显示如下 −

In order to understand how it works, we are going to create a confusion matrix for a binary classification model that predicts, whether a credit card transaction was normal or a fraud. It is shown as follows −

Actual fraud

Actual normal

Predicted fraud

True positive

False positive

Predicted normal

False negative

True negative

जैसा कि हम देख सकते हैं, ऊपर दिए गए नमूना भ्रम मैट्रिक्स में 2 कॉलम हैं, एक वर्ग धोखाधड़ी के लिए और दूसरा वर्ग सामान्य के लिए। उसी तरह हमारे पास 2 पंक्तियाँ हैं, एक वर्ग धोखाधड़ी के लिए जोड़ी गई है और दूसरी वर्ग सामान्य के लिए जोड़ी गई है। निम्नलिखित भ्रम मैट्रिक्स से जुड़े शब्दों की व्याख्या है −

As we can see, the above sample confusion matrix contains 2 columns, one for class fraud and other for class normal. In the same way we have 2 rows, one is added for class fraud and other is added for class normal. Following is the explanation of the terms associated with confusion matrix −

  1. * True Positives* − When both actual class & predicted class of data point is 1.

  2. * True Negatives* − When both actual class & predicted class of data point is 0.

  3. * False Positives* − When actual class of data point is 0 & predicted class of data point is 1.

  4. * False Negatives* − When actual class of data point is 1 & predicted class of data point is 0.

让我们看看,如何从混淆矩阵中计算出不同事项的数量 −

Let’s see, how we can calculate number of different things from the confusion matrix −

  1. Accuracy − It is the number of correct predictions made by our ML classification model. It can be calculated with the help of following formula −

  2. Precision −It tells us how many samples were correctly predicted out of all samples we predicted. It can be calculated with the help of following formula −

  3. Recall or Sensitivity − Recall are the number of positives returned by our ML classification model. In other words, it tells us how many of the fraud cases in the dataset were actually detected by the model. It can be calculated with the help of following formula −

  4. Specificity − Opposite to recall, it gives the number of negatives returned by our ML classification model. It can be calculated with the help of following formula −

F-measure

我们可以将 F 度量作为混淆矩阵的替代方法。这样的主要原因是,我们无法同时使召回率和精确率最大化。这些指标之间存在着非常牢固的关系,这可以通过以下示例了解 −

We can use F-measure as an alternative of Confusion matrix. The main reason behind this, we can’t maximize Recall and Precision at the same time. There is a very strong relationship between these metrics and that can be understood with the help of following example −

假设,我们要使用 DL 模型将细胞样本分类为癌性或正常。这里,为了达到最高的精确率,我们需要将预测数减少到 1。尽管这可以使我们达到约 100% 的精确率,但召回率将变得非常低。

Suppose, we want to use a DL model to classify cell samples as cancerous or normal. Here, to reach maximum precision we need to reduce the number of predictions to 1. Although, this can give us reach around 100 percent precision, but recall will become really low.

另一方面,如果我们想要达到最大的召回率,我们需要进行尽可能多的预测。虽然这可以使我们的召回率达到约 100%,但精确率将变得非常低。

On the other hand, if we would like to reach maximum recall, we need to make as many predictions as possible. Although, this can give us reach around 100 percent recall, but precision will become really low.

在实践中,我们需要找到一种在精确率和召回率之间取得平衡的方法。F 度量指标允许我们这样做,因为它表示精确率和召回率之间的调和平均。

In practice, we need to find a way balancing between precision and recall. The F-measure metric allows us to do so, as it expresses a harmonic average between precision and recall.

fmeasure

该公式被称为 F1 度量,其中称作 B 的额外项设置为 1,以得到精确率和召回率的相同样比。为了强调召回率,我们可以将因子 B 设置为 2。另一方面,为了强调精确率,我们可以将因子 B 设置为 0.5。

This formula is called the F1-measure, where the extra term called B is set to 1 to get an equal ratio of precision and recall. In order to emphasize recall, we can set the factor B to 2. On the other hand, to emphasize precision, we can set the factor B to 0.5.

Using CNTK to measure classification performance

在上一节中,我们已经使用 Iris 花数据集创建了一个分类模型。在此,我们将使用混淆矩阵和 F 度量指标来衡量其性能。

In previous section we have created a classification model using Iris flower dataset. Here, we will be measuring its performance by using confusion matrix and F-measure metric.

Creating Confusion matrix

我们已经创建了该模型,所以我们可以开始对模型进行验证过程,该过程包括 confusion matrix 。首先,我们将使用 scikit-learn 中的 confusion_matrix 函数创建混淆矩阵。为此,我们需要我们的测试样本的真实标签和相同测试样本的预测标签。

We already created the model, so we can start the validating process, which includes confusion matrix, on the same. First, we are going to create confusion matrix with the help of the confusion_matrix function from scikit-learn. For this, we need the real labels for our test samples and the predicted labels for the same test samples.

让我们使用以下 python 代码来计算混淆矩阵 −

Let’s calculate the confusion matrix by using following python code −

from sklearn.metrics import confusion_matrix
y_true = np.argmax(y_test, axis=1)
y_pred = np.argmax(z(X_test), axis=1)
matrix = confusion_matrix(y_true=y_true, y_pred=y_pred)
print(matrix)

Output

[[10 0 0]
[ 0 1 9]
[ 0 0 10]]

我们还可以使用热图函数以如下方式将混淆矩阵可视化 −

We can also use heatmap function to visualise a confusion matrix as follows −

import seaborn as sns
import matplotlib.pyplot as plt
g = sns.heatmap(matrix,
     annot=True,
     xticklabels=label_encoder.classes_.tolist(),
     yticklabels=label_encoder.classes_.tolist(),
     cmap='Blues')
g.set_yticklabels(g.get_yticklabels(), rotation=0)
plt.show()
iris

我们还应该有一个单一的性能数字,我们可以使用它来比较模型。为此,我们需要使用 CNTK 中指标包中的 classification_error 函数来计算分类错误,就像在创建分类模型时所做的那样。

We should also have a single performance number, that we can use to compare the model. For this, we need to calculate the classification error by using classification_error function, from the metrics package in CNTK as done while creating classification model.

现在,要来计算分类错误,请使用数据集中带有损失函数的测试方法。之后,CNTK 会将我们提供给该函数作为输入的样本作为输入,并根据输入特征 X_*test* 做出预测。

Now to calculate the classification error, execute the test method on the loss function with a dataset. After that, CNTK will take the samples we provided as input for this function and make a prediction based on input features X_*test*.

loss.test([X_test, y_test])

Output

{'metric': 0.36666666666, 'samples': 30}

Implementing F-Measures

为了实现 F 度量,CNTK 还包括一个名为 fmeasures 的函数。我们在通过训练 NN 时可以将单元格 cntk.metrics.classification_error 替换为对 cntk.losses.fmeasure 的调用,在定义准则工厂函数时使用如下方式 −

For implementing F-Measures, CNTK also includes function called fmeasures. We can use this function, while training the NN by replacing the cell cntk.metrics.classification_error, with a call to cntk.losses.fmeasure when defining the criterion factory function as follows −

import cntk
@cntk.Function
def criterion_factory(output, target):
   loss = cntk.losses.cross_entropy_with_softmax(output, target)
metric = cntk.losses.fmeasure(output, target)
   return loss, metric

在使用 cntk.losses.fmeasure 函数后,我们将获得如下所示的 loss.test 方法调用的不同输出 −

After using cntk.losses.fmeasure function, we will get different output for the loss.test method call given as follows −

loss.test([X_test, y_test])

Output

{'metric': 0.83101488749, 'samples': 30}

CNTK - Regression Model

在此处,我们将学习关于衡量回归模型的性能。

Here, we will study about measuring performance with regards to a regression model.

Basics of validating a regression model

众所周知,回归模型不同于分类模型,因为对于个体样本,没有正确或错误的二进制度量。在回归模型中,我们希望衡量预测值与实际值之间的接近程度。预测值与预期输出越接近,模型的性能就越好。

As we know that regression models are different than classification models, in the sense that, there is no binary measure of right or wrong for individuals’ samples. In regression models, we want to measure how close the prediction is to the actual value. The closer the prediction value is to the expected output, the better the model performs.

在此处,我们将使用不同的误差率函数来衡量用于回归的神经网络的性能。

Here, we are going to measure the performance of NN used for regression using different error-rate functions.

Calculating error margin

如前所述,在验证回归模型时,我们无法确定预测是正确还是错误。我们希望预测尽可能接近实际值。但是,在此处可以接受较小的误差范围。

As discussed earlier, while validating a regression model, we can’t say whether a prediction is right or wrong. We want our prediction to be as close as possible to the real value. But, a small error margin is acceptable here.

计算误差范围的公式如下 −

The formula for calculating the error margin is as follows −

error margin

在此,

Here,

Predicted value = 由帽指示的 y

Predicted value = indicated y by a hat

Real value = 由 y 预测

Real value = predicted by y

首先,我们需要计算预测值与实际值之间的距离。然后,为了获得总体误差率,我们需要对这些平方距离求和并计算平均值。这被称为 mean squared 误差函数。

First, we need to calculate the distance between the predicted and the real value. Then, to get an overall error rate, we need to sum these squared distances and calculate the average. This is called the mean squared error function.

但是,如果我们想要表示误差范围的性能数据,我们需要一个表示绝对误差的公式。 mean absolute 误差函数的公式如下 −

But, if we want performance figures that express an error margin, we need a formula that expresses the absolute error. The formula for mean absolute error function is as follows −

mean absolute

以上公式获取预测值与实际值之间的绝对距离。

The above formula takes the absolute distance between the predicted and the real value.

Using CNTK to measure regression performance

在此处,我们将了解如何与 CNTK 结合使用我们在讨论中提到的不同指标。我们将使用回归模型,根据以下步骤预测汽车每加仑英里的行程。

Here, we will look at how to use the different metrics, we discussed in combination with CNTK. We will use a regression model, that predicts miles per gallon for cars using the steps given below.

Implementation steps−

Step 1 − 首先,我们需要从 cntk 包导入所需组件,如下所示 −

Step 1 − First, we need to import the required components from cntk package as follows −

from cntk import default_option, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import relu

Step 2 − 接下来,我们需要使用 default_options 函数定义一个默认激活函数。然后,创建一个新的顺序层集并提供两个各具有 64 个神经元的稠密层。然后,我们将一个额外的稠密层(将充当输出层)添加到顺序层集并提供 1 个没有激活的神经元,如下所示 −

Step 2 − Next, we need to define a default activation function using the default_options functions. Then, create a new Sequential layer set and provide two Dense layers with 64 neurons each. Then, we add an additional Dense layer (which will act as the output layer) to the Sequential layer set and give 1 neuron without an activation as follows −

with default_options(activation=relu):
model = Sequential([Dense(64),Dense(64),Dense(1,activation=None)])

Step 3 − 创建网络后,我们需要创建一个输入特征。我们需要确保该特征与我们即将用于训练的特征具有相同的形状。

Step 3 − Once the network has been created, we need to create an input feature. We need to make sure that, it has the same shape as the features that we are going to be using for training.

features = input_variable(X.shape[1])

Step 4 − 现在,我们需要创建一个大小为 1 的另一个 input_variable 。它将用于存储神经网络的预期值。

Step 4 − Now, we need to create another input_variable with size 1. It will be used to store the expected value for NN.

target = input_variable(1)
z = model(features)

现在,我们需要训练模型,为此,我们将拆分数据集并使用以下实现步骤执行预处理 −

Now, we need to train the model and in order to do so, we are going to split the dataset and perform preprocessing using the following implementation steps −

Step 5 − 首先,从 sklearn.preprocessing 导入 StandardScaler 以获取 -1 到 +1 之间的值。这将有助于防止神经网络中梯度爆炸问题。

Step 5 −First, import StandardScaler from sklearn.preprocessing to get the values between -1 and +1. This will help us against exploding gradient problems in the NN.

from sklearn.preprocessing import StandardScalar

Step 6 − 接下来,从 sklearn.model_selection 中导入 train_test_split,如下所示 −

Step 6 − Next, import train_test_split from sklearn.model_selection as follows−

from sklearn.model_selection import train_test_split

Step 7 − 使用 drop*method. At last split the dataset into a training and validation set using the *train_test_split 函数按照如下方法在数据集中删除 mpg 列 −

Step 7 − Drop the mpg column from the dataset by using the drop*method. At last split the dataset into a training and validation set using the *train_test_split function as follows −

x = df_cars.drop(columns=[‘mpg’]).values.astype(np.float32)
y=df_cars.iloc[: , 0].values.reshape(-1, 1).astype(np.float32)
scaler = StandardScaler()
X = scaler.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Step 8 − 现在,我们需要创建另一个大小为 1 的 input_variable。它将用于存储 NN 的预期值。

Step 8 − Now, we need to create another input_variable with size 1. It will be used to store the expected value for NN.

target = input_variable(1)
z = model(features)

我们在分割和预处理了数据后,现在我们需要训练 NN。与创建回归模型时的前几个部分一样,我们需要定义一个损失函数和 metric 函数的组合训练模型。

We have split as well as preprocessed the data, now we need to train the NN. As did in previous sections while creating regression model, we need to define a combination of a loss and metric function to train the model.

import cntk
def absolute_error(output, target):
   return cntk.ops.reduce_mean(cntk.ops.abs(output – target))
@ cntk.Function
def criterion_factory(output, target):
   loss = squared_error(output, target)
   metric = absolute_error(output, target)
   return loss, metric

现在,我们来看一下如何使用训练的模型。对于我们的模型,我们将 criterion_factory 用作损失函数和度量值组合。

Now, let’s have a look at how to use the trained model. For our model, we will use criterion_factory as the loss and metric combination.

from cntk.losses import squared_error
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_printer = ProgressPrinter(0)
loss = criterion_factory (z, target)
learner = sgd(z.parameters, 0.001)
training_summary=loss.train((x_train,y_train),parameter_learners=[learner],callbacks=[progress_printer],minibatch_size=16,max_epochs=10)

Complete implementation example

from cntk import default_option, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import relu
with default_options(activation=relu):
model = Sequential([Dense(64),Dense(64),Dense(1,activation=None)])
features = input_variable(X.shape[1])
target = input_variable(1)
z = model(features)
from sklearn.preprocessing import StandardScalar
from sklearn.model_selection import train_test_split
x = df_cars.drop(columns=[‘mpg’]).values.astype(np.float32)
y=df_cars.iloc[: , 0].values.reshape(-1, 1).astype(np.float32)
scaler = StandardScaler()
X = scaler.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
target = input_variable(1)
z = model(features)
import cntk
def absolute_error(output, target):
   return cntk.ops.reduce_mean(cntk.ops.abs(output – target))
@ cntk.Function
def criterion_factory(output, target):
loss = squared_error(output, target)
metric = absolute_error(output, target)
return loss, metric
from cntk.losses import squared_error
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_printer = ProgressPrinter(0)
loss = criterion_factory (z, target)
learner = sgd(z.parameters, 0.001)
training_summary=loss.train((x_train,y_train),parameter_learners=[learner],callbacks=[progress_printer],minibatch_size=16,max_epochs=10)

Output

-------------------------------------------------------------------
average  since   average   since  examples
loss     last    metric    last
------------------------------------------------------
Learning rate per minibatch: 0.001
690       690     24.9     24.9       16
654       636     24.1     23.7       48
[………]

为了验证我们的回归模型,我们需要确保模型处理新数据的效果与处理训练数据一样好。为此,我们需要使用测试数据在 lossmetric 组合上调用 test 方法,如下所示 −

In order to validate our regression model, we need to make sure that, the model handles new data just as well as it does with the training data. For this, we need to invoke the test method on loss and metric combination with test data as follows −

loss.test([X_test, y_test])

Output−

{'metric': 1.89679785619, 'samples': 79}

CNTK - Out-of-Memory Datasets

在本章中,将阐明如何测量超出内存数据集的性能。

In this chapter, how to measure performance of out-of-memory datasets will be explained.

在前几部分中,我们已经讨论了验证我们 NN 性能的各种方法,但我们讨论的方法都是针对可以放入内存的数据集。

In previous sections, we have discussed about various methods to validate the performance of our NN, but the methods we have discussed, are ones that deals with the datasets that fit in the memory.

这里就会产生一个问题,超出内存的数据集怎么办,因为在生产场景中,我们需要大量数据来训练 NN 。在本部分中,我们将讨论在使用 Minibatch 群集和手动 Minibatch 循环后如何测量性能。

Here, the question arises what about out-of-memory datasets, because in production scenario, we need a lot of data to train NN. In this section, we are going to discuss how to measure performance when working with minibatch sources and manual minibatch loop.

Minibatch sources

在使用超出内存数据集(即 Minibatch 群集)时,我们需要设置一个与处理小型数据集(即内存中数据集)时不同的损失函数和度量值。首先,我们将了解如何设置一种方法来向 NN 模型的训练器输入数据。

While working with out-of-memory dataset, i.e. minibatch sources, we need slightly different setup for loss, as well as metric, than the setup we used while working with small datasets i.e. in-memory datasets. First, we will see how to set up a way to feed data to the trainer of NN model.

以下为实现步骤 −

Following are the implementation steps−

Step 1 − 首先,从 *cntk.*io 模块导入用于创建 Minibatch 群集的组件,如下所示 −

Step 1 − First, from *cntk.*io module import the components for creating the minibatch source as follows−

from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer,
 INFINITY_REPEAT

Step 2 − 接下来,创建一个名为 create_datasource 的新函数。此函数有 2 个参数,即 filename 和 limit,默认值为 INFINITELY_REPEAT

Step 2 − Next, create a new function named say create_datasource. This function will have two parameters namely filename and limit, with a default value of INFINITELY_REPEAT.

def create_datasource(filename, limit =INFINITELY_REPEAT)

Step 3 − 现在,在函数中,使用 StreamDef 类为标签创建流定义,读取具有 3 个特征的标签字段。我们还需要将 is_sparse 设置为 False ,如下所示 −

Step 3 − Now, within the function, by using StreamDef class crate a stream definition for the labels that reads from the labels field that has three features. We also need to set is_sparse to False as follows−

labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)

Step 4 − 接下来,创建用于从输入文件中读取特征字段的 StreamDef 的另一个实例,如下所示。

Step 4 − Next, create to read the features filed from the input file, create another instance of StreamDef as follows.

feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)

Step 5 − 现在,初始化 CTFDeserializer 实例类。指定我们需要反序列化的 filename 和流,如下所示 −

Step 5 − Now, initialise the CTFDeserializer instance class. Specify the filename and streams that we need to deserialize as follows −

deserializer = CTFDeserializer(filename, StreamDefs(labels=
label_stream, features=features_stream)

Step 6 − 接下来,我们需要使用反序列化器创建 minisourceBatch 的实例,如下所示 −

Step 6 − Next, we need to create instance of minisourceBatch by using deserializer as follows −

Minibatch_source = MinibatchSource(deserializer, randomize=True, max_sweeps=limit)
return minibatch_source

Step 7 − 最后,我们需要提供我们在前几部分创建的训练和测试源。我们正在使用鸢尾属数据集。

Step 7 − At last, we need to provide training and testing source, which we created in previous sections also. We are using iris flower dataset.

training_source = create_datasource(‘Iris_train.ctf’)
test_source = create_datasource(‘Iris_test.ctf’, limit=1)

创建了 MinibatchSource 实例后,就需要对它进行训练了。我们可以使用与处理小型内存中数据集时相同的训练逻辑。在这里,我们将使用 MinibatchSource 实例作为损失函数训练方法的输入,如下所示 −

Once you create MinibatchSource instance, we need to train it. We can use the same training logic, as used when we worked with small in-memory datasets. Here, we will use MinibatchSource instance, as the input for the train method on loss function as follows −

以下为实现步骤 −

Following are the implementation steps−

Step 1 − 为了记录训练会话的输出,首先需要从 cntk.logging 模块中导入 ProgressPrinter ,如下所示 −

Step 1 − In order to log the output of the training session, first import the ProgressPrinter from cntk.logging module as follows −

from cntk.logging import ProgressPrinter

Step 2 − 接下來,要設定訓練階段,匯入 trainertraining_session *from *cntk.train 模組如下 −

Step 2 − Next, to set up the training session, import the trainer and training_session *from *cntk.train module as follows−

from cntk.train import Trainer, training_session

Step 3 − 現在,我們需要定義一些常數組,如下列 minibatch_sizesamples_per_epochnum_epochs

Step 3 − Now, we need to define some set of constants like minibatch_size, samples_per_epoch and num_epochs as follows−

minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30
max_samples = samples_per_epoch * num_epochs

Step 4 − 接下來,為了瞭解如何在 CNTK 訓練期間讀取資料,我們需要定義網路輸入變數與 minibatch 來源中的串流之間的對應。

Step 4 − Next, in order to know how to read data during training in CNTK, we need to define a mapping between the input variable for the network and the streams in the minibatch source.

input_map = {
   features: training_source.streams.features,
   labels: training_source.streams.labels
}

Step 5 − 接著登入訓練程式的輸出,初始化 progress_printer *variable with a new *ProgressPrinter 實例。初始化 trainer 並提供模型如下 −

Step 5 − Next to log the output of the training process, initialize the progress_printer *variable with a new *ProgressPrinter instance. Also, initialize the trainer and provide it with the model as follows−

progress_writer = ProgressPrinter(0)
trainer: training_source.streams.labels

Step 6 − 最後,要開始訓練程式,我們需要呼叫 training_session 函式如下 −

Step 6 − At last, to start the training process, we need to invoke the training_session function as follows −

session = training_session(trainer,
   mb_source=training_source,
   mb_size=minibatch_size,
   model_inputs_to_streams=input_map,
   max_samples=max_samples,
   test_config=test_config)
session.train()

一旦訓練好模型,我們可透過 TestConfig 物件新增驗證至這個設定,並將它指派給 train_session 函式的 test_config 關鍵字引數。

Once we trained the model, we can add validation to this setup by using a TestConfig object and assign it to the test_config keyword argument of the train_session function.

以下为实现步骤 −

Following are the implementation steps−

Step 1 − 首先,我們需要從 cntk.train 模組中匯入 TestConfig 類別如下 −

Step 1 − First, we need to import the TestConfig class from the module cntk.train as follows−

from cntk.train import TestConfig

Step 2 − 現在,我們需要使用 test_source 作為輸入,建立一個新的 TestConfig 實例 −

Step 2 − Now, we need to create a new instance of the TestConfig with the test_source as input−

Test_config = TestConfig(test_source)

Complete Example

from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer, INFINITY_REPEAT
def create_datasource(filename, limit =INFINITELY_REPEAT)
labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)
feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)
deserializer = CTFDeserializer(filename, StreamDefs(labels=label_stream, features=features_stream)
Minibatch_source = MinibatchSource(deserializer, randomize=True, max_sweeps=limit)
return minibatch_source
training_source = create_datasource(‘Iris_train.ctf’)
test_source = create_datasource(‘Iris_test.ctf’, limit=1)
from cntk.logging import ProgressPrinter
from cntk.train import Trainer, training_session
minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30
max_samples = samples_per_epoch * num_epochs
input_map = {
   features:   training_source.streams.features,
   labels: training_source.streams.labels
 }
progress_writer = ProgressPrinter(0)
trainer: training_source.streams.labels
session = training_session(trainer,
   mb_source=training_source,
   mb_size=minibatch_size,
   model_inputs_to_streams=input_map,
   max_samples=max_samples,
   test_config=test_config)
session.train()
from cntk.train import TestConfig
Test_config = TestConfig(test_source)

Output

-------------------------------------------------------------------
average   since   average   since  examples
loss      last    metric    last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.57      1.57     0.214    0.214   16
1.38      1.28     0.264    0.289   48
[………]
Finished Evaluation [1]: Minibatch[1-1]:metric = 69.65*30;

Manual minibatch loop

如同我們上面看到的,透過 CNTK 以一般 API 進行訓練時,利用量測值,可以在訓練期間及之後輕鬆量測神經網路模型的執行效能。但是,另一方面,使用手動 minibatch 迴圈時,事情就不是那麼容易。

As we see above, it is easy to measure the performance of our NN model during and after training, by using the metrics when training with regular APIs in CNTK. But, on the other side, things will not be that easy while working with a manual minibatch loop.

在這裡,我們使用下面這個具有 4 個輸入及 3 個輸出,也在前一段建立的鳶尾花資料集的模型 −

Here, we are using the model given below with 4 inputs and 3 outputs from Iris Flower dataset, created in previous sections too−

from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu, sigmoid
from cntk.learners import sgd
model = Sequential([
   Dense(4, activation=sigmoid),
   Dense(3, activation=log_softmax)
])
features = input_variable(4)
labels = input_variable(3)
z = model(features)

接著,定義模型的損失,為交叉熵損失函式及在先前部分中使用的 F 量測值的組合。我們將使用 criterion_factory 工具程式,建立為 CNTK 函式物件,如下所示 −

Next, the loss for the model is defined as the combination of the cross-entropy loss function, and the F-measure metric as used in previous sections. We are going to use the criterion_factory utility, to create this as a CNTK function object as shown below−

import cntk
from cntk.losses import cross_entropy_with_softmax, fmeasure
@cntk.Function
def criterion_factory(outputs, targets):
   loss = cross_entropy_with_softmax(outputs, targets)
   metric = fmeasure(outputs, targets, beta=1)
   return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, 0.1)
label_mapping = {
   'Iris-setosa': 0,
   'Iris-versicolor': 1,
   'Iris-virginica': 2
}

現在,由於我們定義了損失函式,我們將瞭解如何將它用於訓練器中,設定手動訓練階段。

Now, as we have defined the loss function, we will see how we can use it in the trainer, to set up a manual training session.

以下是執行步驟 −

Following are the implementation steps −

Step 1 − 首先,我們需要匯入必要的套件,例如 numpypandas ,以載入及預處理資料。

Step 1 − First, we need to import the required packages like numpy and pandas to load and preprocess the data.

import pandas as pd
import numpy as np

Step 2 − 接下來,為了登入訓練期間的資訊,匯入 ProgressPrinter 類別如下 −

Step 2 − Next, in order to log information during training, import the ProgressPrinter class as follows−

from cntk.logging import ProgressPrinter

Step 3 − 然後,需要從 cntk.train 模組匯入訓練器模組如下 −

Step 3 − Then, we need to import the trainer module from cntk.train module as follows −

from cntk.train import Trainer

Step 4 − 接著,建立 ProgressPrinter 的新實例如下 −

Step 4 − Next, create a new instance of ProgressPrinter as follows −

progress_writer = ProgressPrinter(0)

Step 5 − 現在,我們需要使用損失、學習器及 progress_writer 等參數,來初始化訓練器如下 −

Step 5 − Now, we need to initialise trainer with the parameters the loss, the learner and the progress_writer as follows −

trainer = Trainer(z, loss, learner, progress_writer)

Step 6 − 接下來,為了訓練模型,我們將建立一個迴圈,迴圈將會對資料集重複 30 次。這將會是外部訓練迴圈。

Step 6 −Next, in order to train the model, we will create a loop that will iterate over the dataset thirty times. This will be the outer training loop.

for _ in range(0,30):

Step 7 - 现在,我们需要使用 pandas 从磁盘中加载数据。然后,为了加载 mini-batches 中的数据集,将 chunksize 关键字参数设为 16。

Step 7 − Now, we need to load the data from disk using pandas. Then, in order to load the dataset in mini-batches, set the chunksize keyword argument to 16.

input_data = pd.read_csv('iris.csv',
names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'],
index_col=False, chunksize=16)

Step 8 - 现在,创建一个内部训练循环以遍历每个 mini-batches

Step 8 − Now, create an inner training for loop to iterate over each of the mini-batches.

for df_batch in input_data:

Step 9 - 现在在这个循环内,使用 iloc *indexer, as the *features 读取前四列以进行训练,并将它们转换为 float32 −

Step 9 − Now inside this loop, read the first four columns using the iloc *indexer, as the *features to train from and convert them to float32 −

feature_values = df_batch.iloc[:,:4].values
feature_values = feature_values.astype(np.float32)

Step 10 - 现在,按照如下方式读取最后一列作为训练标签 −

Step 10 − Now, read the last column as the labels to train from, as follows −

label_values = df_batch.iloc[:,-1]

Step 11 - 接下来的,我们将使用独热向量将标签字符串转换为它们的数字演示,如下所示 −

Step 11 − Next, we will use one-hot vectors to convert the label strings to their numeric presentation as follows −

label_values = label_values.map(lambda x: label_mapping[x])

Step 12 - 在此之后,获得标签的数字演示。然后,将它们转换为 numpy 数组,这样可以更轻松地使用它们,如下所示 −

Step 12 − After that, take the numeric presentation of the labels. Next, convert them to a numpy array, so it is easier to work with them as follows −

label_values = label_values.values

Step 13 - 现在,我们需要创建一个新的 numpy 数组,它与我们转换的标签值具有相同数量的行。

Step 13 − Now, we need to create a new numpy array that has the same number of rows as the label values that we have converted.

encoded_labels = np.zeros((label_values.shape[0], 3))

Step 14 - 现在,为了创建独热编码的标签,基于数字标签值选择列。

Step 14 − Now, in order to create one-hot encoded labels, select the columns based on the numeric label values.

encoded_labels[np.arange(label_values.shape[0]), label_values] = 1.

Step 15 - 最后,我们需要在训练器上调用 train_minibatch 方法,并为小批量提供已处理的特征和标签。

Step 15 − At last, we need to invoke the train_minibatch method on the trainer and provide the processed features and labels for the minibatch.

trainer.train_minibatch({features: feature_values, labels: encoded_labels})

Complete Example

from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu, sigmoid
from cntk.learners import sgd
model = Sequential([
   Dense(4, activation=sigmoid),
   Dense(3, activation=log_softmax)
])
features = input_variable(4)
labels = input_variable(3)
z = model(features)
import cntk
from cntk.losses import cross_entropy_with_softmax, fmeasure
@cntk.Function
def criterion_factory(outputs, targets):
   loss = cross_entropy_with_softmax(outputs, targets)
   metric = fmeasure(outputs, targets, beta=1)
   return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, 0.1)
label_mapping = {
   'Iris-setosa': 0,
   'Iris-versicolor': 1,
   'Iris-virginica': 2
}
import pandas as pd
import numpy as np
from cntk.logging import ProgressPrinter
from cntk.train import Trainer
progress_writer = ProgressPrinter(0)
trainer = Trainer(z, loss, learner, progress_writer)
for _ in range(0,30):
   input_data = pd.read_csv('iris.csv',
      names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'],
      index_col=False, chunksize=16)
for df_batch in input_data:
   feature_values = df_batch.iloc[:,:4].values
   feature_values = feature_values.astype(np.float32)
   label_values = df_batch.iloc[:,-1]
label_values = label_values.map(lambda x: label_mapping[x])
label_values = label_values.values
   encoded_labels = np.zeros((label_values.shape[0], 3))
   encoded_labels[np.arange(label_values.shape[0]),
label_values] = 1.
   trainer.train_minibatch({features: feature_values, labels: encoded_labels})

Output

-------------------------------------------------------------------
average    since    average   since  examples
loss       last      metric   last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.45       1.45     -0.189    -0.189   16
1.24       1.13     -0.0382    0.0371  48
[………]

在以上输出中,我们获得了损失和训练期间指标的输出。这是因为我们在一个 function 对象中合并了一个指标和损失,并在训练器配置中使用了进度打印机。

In the above output, we got both the output for the loss and the metric during training. It is because we combined a metric and loss in a function object and used a progress printer in the trainer configuration.

现在,为了评估模型性能,我们需要执行与训练模型相同的任务,但这次,我们需要使用 Evaluator 实例来测试模型。在下面的 Python 代码中展示了这一点−

Now, in order to evaluate the model performance, we need to perform same task as with training the model, but this time, we need to use an Evaluator instance to test the model. It is shown in the following Python code−

from cntk import Evaluator
evaluator = Evaluator(loss.outputs[1], [progress_writer])
input_data = pd.read_csv('iris.csv',
   names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'],
index_col=False, chunksize=16)
for df_batch in input_data:
   feature_values = df_batch.iloc[:,:4].values
   feature_values = feature_values.astype(np.float32)
   label_values = df_batch.iloc[:,-1]
   label_values = label_values.map(lambda x: label_mapping[x])
   label_values = label_values.values
   encoded_labels = np.zeros((label_values.shape[0], 3))
   encoded_labels[np.arange(label_values.shape[0]), label_values] = 1.
   evaluator.test_minibatch({ features: feature_values, labels:
      encoded_labels})
evaluator.summarize_test_progress()

现在,我们将获得类似于以下内容的输出−

Now, we will get the output something like the following−

Output

Finished Evaluation [1]: Minibatch[1-11]:metric = 74.62*143;

CNTK - Monitoring the Model

在本章中,我们将了解如何在 CNTK 中监视模型。

In this chapter, we will understand how to monitor a model in CNTK.

Introduction

在前面的部分中,我们已经对我们的 NN 模型进行了一些验证。但是,在训练期间监视我们的模型是否也有必要并且是否可行?

In previous sections, we have done some validation on our NN models. But, is it also necessary and possible to monitor our model during training?

是的,我们已经使用 ProgressWriter 类来监视我们的模型,还有许多其他方式可以做到这一点。在深入了解这些方法之前,首先让我们了解 CNTK 中的监视如何工作以及我们如何使用它来检测 NN 模型中的问题。

Yes, already we have used ProgressWriter class to monitor our model and there are many more ways to do so. Before getting deep into the ways, first let’s have a look how monitoring in CNTK works and how we can use it to detect problems in our NN model.

Callbacks in CNTK

实际上,在训练和验证期间,CNTK 允许我们在 API 的几个位置指定回调。首先,让我们仔细看看 CNTK 何时调用回调。

Actually, during training and validation, CNTK allows us to specify callbacks in several spots in the API. First, let’s take a closer look at when CNTK invokes callbacks.

When CNTK invoke callbacks?

当以下情况发生时,CNTK 会在训练和测试集时刻调用回调−

CNTK will invoke the callbacks at the training and testing set moments when−

  1. A minibatch is completed.

  2. A full sweep over the dataset is completed during training.

  3. A minibatch of testing is completed.

  4. A full sweep over the dataset is completed during testing.

Specifying callbacks

在使用 CNTK 时,我们可以在 API 的多个位置中指定回调。例如−

While working with CNTK, we can specify callbacks in several spots in the API. For example−

When call train on a loss function?

在这里,当我们对损失函数调用“train 时”,可以通过 callbacks 参数指定一组回调,如下所示:

Here, when we call train on a loss function, we can specify a set of callbacks through the callbacks argument as follows−

training_summary=loss.train((x_train,y_train),
parameter_learners=[learner],
callbacks=[progress_writer]),
minibatch_size=16, max_epochs=15)

When working with minibatch sources or using a manual minibatch loop−

在这种情况下,我们可以通过创建 Trainer 指定用于监控目的的回调,如下所示:

In this case, we can specify callbacks for monitoring purpose while creating the Trainer as follows−

from cntk.logging import ProgressPrinter
callbacks = [
   ProgressPrinter(0)
]
Trainer = Trainer(z, (loss, metric), learner, [callbacks])

Various monitoring tools

让我们研究不同的监控工具。

Let us study about different monitoring tools.

ProgressPrinter

在阅读此教程时,你会发现 ProgressPrinter 被作为最常用的监控工具。 ProgressPrinter 监控工具的一些特征如下:

While reading this tutorial, you will find ProgressPrinter as the most used monitoring tool. Some of the characteristics of ProgressPrinter monitoring tool are−

ProgressPrinter 类实现基于控制台的基本日志记录,以监控我们的模型。可以将其记录到我们希望记录到的磁盘。

ProgressPrinter class implements basic console-based logging to monitor our model. It can log to disk we want it to.

在分布式培训场景中工作时特别有用。

Especially useful while working in a distributed training scenario.

在我们无法登录控制台查看 Python 程序输出的场景中工作时,它也非常有用。

It is also very useful while working in a scenario where we can’t log in on the console to see the output of our Python program.

借助以下代码,我们可以创建 ProgressPrinter 的实例:

With the help of following code, we can create an instance of ProgressPrinter

ProgressPrinter(0, log_to_file=’test.txt’)

我们会得到我们在前面部分中看到的内容:

We will get the output something that we have seen in the earlier sections−

Test.txt
CNTKCommandTrainInfo: train : 300
CNTKCommandTrainInfo: CNTKNoMoreCommands_Total : 300
CNTKCommandTrainBegin: train
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.45 1.45 -0.189 -0.189 16
1.24 1.13 -0.0382 0.0371 48
[………]

TensorBoard

使用 ProgressPrinter 的缺点之一是,我们很难很好地了解损失和指标如何随时间推移而发展。TensorBoardProgressWriter 是 CNTK 中 ProgressPrinter 类的绝佳替代品。

One of the disadvantages of using ProgressPrinter is that, we can’t get a good view of how the loss and metric progress over time is hard. TensorBoardProgressWriter is a great alternative to the ProgressPrinter class in CNTK.

在使用它之前,我们需要使用以下命令先进行安装:

Before using it, we need to first install it with the help of following command −

pip install tensorboard

现在,为了使用 TensorBoard,我们需要在我们的训练代码中设置 TensorBoardProgressWriter ,如下所示:

Now, in order to use TensorBoard, we need to set up TensorBoardProgressWriter in our training code as follows−

import time
from cntk.logging import TensorBoardProgressWriter
tensorbrd_writer = TensorBoardProgressWriter(log_dir=’logs/{}’.format(time.time()),freq=1,model=z)

在完成了 NN *模型训练之后,将 *TensorBoardProgressWriter 实例上的 close 方法调用为一种良好做法。

It is a good practice to call the close method on TensorBoardProgressWriter instance after done with the training of *NN*model.

我们可以使用以下命令对 *TensorBoard * 日志数据进行可视化:

We can visualise the *TensorBoard * logging data with the help of following command −

Tensorboard –logdir logs

CNTK - Convolutional Neural Network

在本章中,让我们研究如何在 CNTK 中构建卷积神经网络 (CNN)。

In this chapter, let us study how to construct a Convolutional Neural Network (CNN) in CNTK.

Introduction

卷积神经网络 (CNN) 也由神经元构成,具有可学习的权重和偏差。这就是为什么以这种方式,它们类似于普通神经网络 (NN)。

Convolutional neural networks (CNNs) are also made up of neurons, that have learnable weights and biases. That’s why in this manner, they are like ordinary neural networks (NNs).

如果我们对普通 NN 的工作原理进行回顾,每个神经元会接收一个或多个输入,并获取加权和,它会通过一个激活函数,以生成最终输出。在这里,会出现一个问题,如果 CNN 和普通 NN 具有如此多的相似性,那么是什么让这两个网络彼此不同?

If we recall the working of ordinary NNs, every neuron receives one or more inputs, takes a weighted sum and it passed through an activation function to produce the final output. Here, the question arises that if CNNs and ordinary NNs have so many similarities then what makes these two networks different to each other?

它们的不同之处在于对输入数据和层类型的处理?输入数据的结构在普通神经网络中被忽略,并且在将其输入网络之前,所有数据都被转换成 1 维数组。

What makes them different is the treatment of input data and types of layers? The structure of input data is ignored in ordinary NN and all the data is converted into 1-D array before feeding it into the network.

但是,卷积神经网络架构可以考虑图像的二维结构,对其进行处理并允许其提取特定于图像的属性。此外,CNN 具有一个或多个卷积层和池化层,这是 CNN 的主要构建块。

But, Convolutional Neural Network architecture can consider the 2D structure of the images, process them and allow it to extract the properties that are specific to images. Moreover, CNNs have the advantage of having one or more Convolutional layers and pooling layer, which are the main building blocks of CNNs.

这些层后面跟着一个或多个完全连接的层,就像标准的多层神经网络中一样。因此,我们可以将 CNN 视为完全连接网络的一种特殊情况。

These layers are followed by one or more fully connected layers as in standard multilayer NNs. So, we can think of CNN, as a special case of fully connected networks.

Convolutional Neural Network (CNN) architecture

CNN 的架构基本上是一系列层,将 3 维(即图像卷的宽度、高度和深度)转换为 3 维输出卷。这里需要注意的一个重要之处是,当前层中的每个神经元都连接到前一层输出的一个小块 patch,这就像在输入图像上覆盖一个 N*N 滤波器。

The architecture of CNN is basically a list of layers that transforms the 3-dimensional, i.e. width, height and depth of image volume into a 3-dimensional output volume. One important point to note here is that, every neuron in the current layer is connected to a small patch of the output from the previous layer, which is like overlaying a N*N filter on the input image.

它使用 M 个滤波器,这些滤波器基本上是特征提取器,用于提取诸如边缘、角等特征。以下是用于构建卷积神经网络 (CNN) 的层* [INPUT-CONV-RELU-POOL-FC]*−

It uses M filters, which are basically feature extractors that extract features like edges, corner and so on. Following are the layers* [INPUT-CONV-RELU-POOL-FC]* that are used to construct Convolutional neural networks (CNNs)−

  1. * INPUT*− As the name implies, this layer holds the raw pixel values. Raw pixel values mean the data of the image as it is. Example, INPUT [64×64×3] is a 3-channeled RGB image of width-64, height-64 and depth-3.

  2. * CONV*− This layer is one of the building blocks of CNNs as most of the computation is done in this layer. Example - if we use 6 filters on the above mentioned INPUT [64×64×3], this may result in the volume [64×64×6].

  3. RELU−Also called rectified linear unit layer, that applies an activation function to the output of previous layer. In other manner, a non-linearity would be added to the network by RELU.

  4. * POOL*− This layer, i.e. Pooling layer is one other building block of CNNs. The main task of this layer is down-sampling, which means it operates independently on every slice of the input and resizes it spatially.

  5. FC− It is called Fully Connected layer or more specifically the output layer. It is used to compute output class score and the resulting output is volume of the size 1*1*L where L is the number corresponding to class score.

下面的图表展示了 CNN 的典型架构−

The diagram below represents the typical architecture of CNNs−

cnns architecture

Creating CNN structure

我们已经了解了 CNN 的架构和基础知识,现在我们准备使用 CNTK 构建卷积网络。在这里,我们将首先了解如何构建 CNN 的结构,然后我们将了解如何训练它的参数。

We have seen the architecture and the basics of CNN, now we are going to building convolutional network using CNTK. Here, we will first see how to put together the structure of the CNN and then we will look at how to train the parameters of it.

最后我们将了解,如何通过使用各种不同的层设置来更改其结构来改进神经网络。我们将使用 MNIST 图像数据集。

At last we’ll see, how we can improve the neural network by changing its structure with various different layer setups. We are going to use MNIST image dataset.

因此,首先让我们创建一个 CNN 结构。通常,当我们构建一个用于识别图像中模式的 CNN 时,我们进行以下操作:

So, first let’s create a CNN structure. Generally, when we build a CNN for recognizing patterns in images, we do the following−

  1. We use a combination of convolution and pooling layers.

  2. One or more hidden layer at the end of the network.

  3. At last, we finish the network with a softmax layer for classification purpose.

借助以下步骤,我们可以构建网络结构:

With the help of following steps, we can build the network structure−

Step 1 - 首先,我们需要导入 CNN 所需的层。

Step 1− First, we need to import the required layers for CNN.

from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling

Step 2 − 接下里,我们需要导入 CNN 的激活函数。

Step 2− Next, we need to import the activation functions for CNN.

from cntk.ops import log_softmax, relu

Step 3 − 为稍后初始化卷积层,我们需要按如下方式导入 glorot_uniform_initializer

Step 3− After that in order to initialize the convolutional layers later, we need to import the glorot_uniform_initializer as follows−

from cntk.initializer import glorot_uniform

Step 4 − 接下里,为了创建输入变量,导入 input_variable 函数。并导入 default_option 函数,以使 NN 配置更加简单。

Step 4− Next, to create input variables import the input_variable function. And import default_option function, to make configuration of NN a bit easier.

from cntk import input_variable, default_options

Step 5 − 现要存储输入图像,创建一个新的 input_variable 。它包含三个通道,分别是红色、绿色和蓝色。它的大小为 28 x 28 像素。

Step 5− Now to store the input images, create a new input_variable. It will contain three channels namely red, green and blue. It would have the size of 28 by 28 pixels.

features = input_variable((3,28,28))

Step 6 − 接下里,我们需要创建另一个 input_variable ,以存储要预测的标签。

Step 6−Next, we need to create another input_variable to store the labels to predict.

labels = input_variable(10)

Step 7 − 现在,我们需要为 NN 创建 default_option 。并需要将 glorot_uniform 用作初始化函数。

Step 7− Now, we need to create the default_option for the NN. And, we need to use the glorot_uniform as the initialization function.

with default_options(initialization=glorot_uniform, activation=relu):

Step 8 − 接下里,为了设置 NN 的结构,我们需要创建一个新的 Sequential 层集。

Step 8− Next, in order to set the structure of the NN, we need to create a new Sequential layer set.

Step 9 − 现在,我们需要在 Sequential 层集中添加一个 Convolutional2D 层,该层具有 filter_shape 为 5 和 strides 设置为 1 。此外,启用填充,以便对图像进行填充以保留原始尺寸。

Step 9− Now we need to add a Convolutional2D layer with a filter_shape of 5 and a strides setting of 1, within the Sequential layer set. Also, enable padding, so that the image is padded to retain the original dimensions.

model = Sequential([
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),

Step 10 − 现在是添加一个 MaxPooling 层的时候了,其中 filter_shape 为 2,而 strides 设置为 2,以将图像压缩一半。

Step 10− Now it’s time to add a MaxPooling layer with filter_shape of 2, and a strides setting of 2 to compress the image by half.

MaxPooling(filter_shape=(2,2), strides=(2,2)),

Step 11 − 现在,如我们在步骤 9 中所做的那样,我们需要添加另一个 Convolutional2D 层,其 filter_shape 为 5 和 strides 设置为 1,使用 16 个滤波器。此外,启用填充,以便保留前一池化层产生的图像大小。

Step 11− Now, as we did in step 9, we need to add another Convolutional2D layer with a filter_shape of 5 and a strides setting of 1, use 16 filters. Also, enable padding, so that, the size of the image produced by the previous pooling layer should be retained.

Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),

Step 12 − 现在,如我们在步骤 10 中所做的那样,再添加一个 MaxPooling *layer with a *filter_shape 的 3 和 strides 设置为 3,以将图像缩小到三分之一。

Step 12− Now, as we did in step 10, add another MaxPooling *layer with a *filter_shape of 3 and a strides setting of 3 to reduce the image to a third.

MaxPooling(filter_shape=(3,3), strides=(3,3)),

Step 13 − 最后,添加一个密集层,该层具有十个神经元,以表示网络可以预测的 10 个可能的类别。为了将网络转换为分类模型,请使用 log_siftmax 激活函数。

Step 13− At last, add a Dense layer with ten neurons for the 10 possible classes, the network can predict. In order to turn the network into a classification model, use a log_siftmax activation function.

Dense(10, activation=log_softmax)
])

Complete Example for creating CNN structure

from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling
from cntk.ops import log_softmax, relu
from cntk.initializer import glorot_uniform
from cntk import input_variable, default_options
features = input_variable((3,28,28))
labels = input_variable(10)
with default_options(initialization=glorot_uniform, activation=relu):
model = Sequential([
   Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),
MaxPooling(filter_shape=(2,2), strides=(2,2)),
   Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),
MaxPooling(filter_shape=(3,3), strides=(3,3)),
Dense(10, activation=log_softmax)
])
z = model(features)

Training CNN with images

当我们创建了网络结构后,就该对该网络进行训练了。但是在开始训练我们的网络之前,我们需要设置最小批次源,这是因为使用图像的 NN 训练需要比大多数计算机拥有的更多的内存。

As we have created the structure of the network, it’s time to train the network. But before starting the training of our network, we need to set up minibatch sources, because training a NN that works with images requires more memory, than most computers have.

我们在前几节中已经创建了最小批次源。以下是设置两个最小批次源的 Python 代码:

We have already created minibatch sources in previous sections. Following is the Python code to set up two minibatch sources −

当我们具有 create_datasource 函数时,我们现在可以创建两个独立的数据源(一个训练和一个测试)来训练模型。

As we have the create_datasource function, we can now create two separate data sources (training and testing one) to train the model.

train_datasource = create_datasource('mnist_train')
test_datasource = create_datasource('mnist_test', max_sweeps=1, train=False)

现在,当我们准备好了图像,我们就可以开始训练我们的 NN 了。正如我们在前几节中所做的那样,我们可以对损失函数使用训练方法来启动训练。以下是此代码:

Now, as we have prepared the images, we can start training of our NN. As we did in previous sections, we can use the train method on the loss function to kick off the training. Following is the code for this −

from cntk import Function
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.learners import sgd
@Function
def criterion_factory(output, targets):
loss = cross_entropy_with_softmax(output, targets)
metric = classification_error(output, targets)
return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, lr=0.2)

借助之前的代码,我们已经为 NN 设置了损失和学习者。以下代码将训练和验证 NN:

With the help of previous code, we have setup the loss and learner for the NN. The following code will train and validate the NN−

from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
   features: train_datasource.streams.features,
   labels: train_datasource.streams.labels
}
loss.train(train_datasource,
     max_epochs=10,
     minibatch_size=64,
     epoch_size=60000,
        parameter_learners=[learner],
     model_inputs_to_streams=input_map,
     callbacks=[progress_writer, test_config])

Complete Implementation Example

from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling
from cntk.ops import log_softmax, relu
from cntk.initializer import glorot_uniform
from cntk import input_variable, default_options
features = input_variable((3,28,28))
labels = input_variable(10)
with default_options(initialization=glorot_uniform, activation=relu):
model = Sequential([
   Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),
MaxPooling(filter_shape=(2,2), strides=(2,2)),
   Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),
MaxPooling(filter_shape=(3,3), strides=(3,3)),
Dense(10, activation=log_softmax)
])
z = model(features)
import os
from cntk.io import MinibatchSource, StreamDef, StreamDefs, ImageDeserializer, INFINITELY_REPEAT
import cntk.io.transforms as xforms
def create_datasource(folder, train=True, max_sweeps=INFINITELY_REPEAT):
   mapping_file = os.path.join(folder, 'mapping.bin')
   image_transforms = []
   if train:
    image_transforms += [
     xforms.crop(crop_type='randomside', side_ratio=0.8),
     xforms.scale(width=28, height=28, channels=3, interpolations='linear')
]
   stream_definitions = StreamDefs(
   features=StreamDef(field='image', transforms=image_transforms),
    labels=StreamDef(field='label', shape=10)
)
   deserializer = ImageDeserializer(mapping_file, stream_definitions)
return MinibatchSource(deserializer, max_sweeps=max_sweeps)
train_datasource = create_datasource('mnist_train')
test_datasource = create_datasource('mnist_test', max_sweeps=1, train=False)
from cntk import Function
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.learners import sgd
@Function
def criterion_factory(output, targets):
   loss = cross_entropy_with_softmax(output, targets)
   metric = classification_error(output, targets)
return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, lr=0.2)
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
   features: train_datasource.streams.features,
   labels: train_datasource.streams.labels
}
loss.train(train_datasource,
     max_epochs=10,
     minibatch_size=64,
     epoch_size=60000,
        parameter_learners=[learner],
     model_inputs_to_streams=input_map,
     callbacks=[progress_writer, test_config])

Output

-------------------------------------------------------------------
average  since  average  since  examples
loss     last   metric   last
------------------------------------------------------
Learning rate per minibatch: 0.2
142      142      0.922   0.922    64
1.35e+06 1.51e+07 0.896   0.883    192
[………]

Image transformations

正如我们所看到的,训练用于图像识别的 NN 非常困难,并且它们还需要大量的数据进行训练。另一个问题是,它们往往在训练时使用的图像上过度拟合。让我们通过一个例子来说明,当我们有处于直立位置的面部照片时,我们的模型将难以识别向其他方向旋转的面部。

As we have seen, it’s difficult to train NN used for image recognition and, they require a lot of data to train also. One more issue is that, they tend to overfit on images used during training. Let us see with an example, when we have photos of faces in an upright position, our model will have a hard time recognizing faces that are rotated in another direction.

为了克服此类问题,我们可以使用图像增强,而 CNTK 在为图像创建最小批次源时支持特定变换。我们可以使用如下转换:

In order to overcome such problem, we can use image augmentation and CNTK supports specific transforms, when creating minibatch sources for images. We can use several transformations as follows−

  1. We can randomly crop images used for training with just a few lines of code.

  2. We can use a scale and color also.

让我们在以下 Python 代码的帮助下看看我们如何通过在用于创建小批量源的函数中包含裁剪变换来更改变换列表。

Let’s see with the help of following Python code, how we can change the list of transformations by including a cropping transformation within the function used to create the minibatch source earlier.

import os
from cntk.io import MinibatchSource, StreamDef, StreamDefs, ImageDeserializer, INFINITELY_REPEAT
import cntk.io.transforms as xforms
def create_datasource(folder, train=True, max_sweeps=INFINITELY_REPEAT):
   mapping_file = os.path.join(folder, 'mapping.bin')
   image_transforms = []
   if train:
   image_transforms += [
     xforms.crop(crop_type='randomside', side_ratio=0.8),
xforms.scale(width=28, height=28, channels=3, interpolations='linear')
]
   stream_definitions = StreamDefs(
   features=StreamDef(field='image', transforms=image_transforms),
labels=StreamDef(field='label', shape=10)
)
   deserializer = ImageDeserializer(mapping_file, stream_definitions)
return MinibatchSource(deserializer, max_sweeps=max_sweeps)

借助上面的代码,我们可以增强函数来包括一组图像变换,以便在训练时可以随机裁剪图像,这样我们可以获得更多图像变化。

With the help of above code, we can enhance the function to include a set of image transforms, so that, when we will be training we can randomly crop the image, so we get more variations of the image.

CNTK - Recurrent Neural Network

现在,让我们了解如何在 CNTK 中构建循环神经网络 (RNN)。

Now, let us understand how to construct a Recurrent Neural Network (RNN) in CNTK.

Introduction

我们学习了如何使用神经网络对图像进行分类,这是深度学习中的标志性任务之一。但是,神经网络另一个出色且正在进行大量研究的领域是循环神经网络 (RNN)。在这里,我们将了解什么是 RNN 以及如何在需要处理时间序列数据的情况下使用它。

We learned how to classify images with a neural network, and it is one of the iconic jobs in deep learning. But, another area where neural network excels at and lot of research happening is Recurrent Neural Networks (RNN). Here, we are going to know what RNN is and how it can be used in scenarios where we need to deal with time-series data.

What is Recurrent Neural Network?

循环神经网络 (RNN) 可以定义为能够随时间推理的特殊类型的 NN。RNN 主要用于需要处理随时间变化的值(即时间序列数据)的情况。为了更好地理解它,让我们在常规神经网络和循环神经网络之间进行一个小比较 -

Recurrent neural networks (RNNs) may be defined as the special breed of NNs that are capable of reasoning over time. RNNs are mainly used in scenarios, where we need to deal with values that change over time, i.e. time-series data. In order to understand it in a better way, let’s have a small comparison between regular neural networks and recurrent neural networks −

  1. As we know that, in a regular neural network, we can provide only one input. This limits it to results in only one prediction. To give you an example, we can do translating text job by using regular neural networks.

  2. On the other hand, in recurrent neural networks, we can provide a sequence of samples that result in a single prediction. In other words, using RNNs we can predict an output sequence based on an input sequence. For example, there have been quite a few successful experiments with RNN in translation tasks.

Uses of Recurrent Neural Network

RNN 可以以多种方式使用。其中一些如下 -

RNNs can be used in several ways. Some of them are as follows −

Predicting a single output

在深入了解 RNN 如何基于序列预测单个输出的步骤之前,让我们看看一个基本的 RNN 是什么样子的 -

Before getting deep dive into the steps, that how RNN can predict a single output based on a sequence, let’s see how a basic RNN looks like−

single output

正如我们在上面的图表中看到的那样,RNN 包含到输入的环回连接,并且每当我们输入一系列的值时,它都会将序列中的每个元素作为时间步长进行处理。

As we can in the above diagram, RNN contains a loopback connection to the input and whenever, we feed a sequence of values it will process each element in the sequence as time steps.

此外,由于环回连接,RNN 可以将生成的输出与序列中下一个元素的输入相结合。通过这种方式,RNN 将在整个序列中构建一个内存,可用于进行预测。

Moreover, because of the loopback connection, RNN can combine the generated output with input for the next element in the sequence. In this way, RNN will build a memory over the whole sequence which can be used to make a prediction.

为了使用 RNN 进行预测,我们可以执行以下步骤 -

In order to make prediction with RNN, we can perform the following steps−

  1. First, to create an initial hidden state, we need to feed the first element of the input sequence.

  2. After that, to produce an updated hidden state, we need to take the initial hidden state and combine it with the second element in the input sequence.

  3. At last, to produce the final hidden state and to predict the output for the RNN, we need to take the final element in the input sequence.

通过这种方式,借助这个环回连接,我们可以教 RNN 识别随时间发生的模式。

In this way, with the help of this loopback connection we can teach a RNN to recognize patterns that happen over time.

Predicting a sequence

上面讨论的基本 RNN 模型还可以扩展到其他用例。例如,我们可以使用它基于单个输入预测一系列值。在这种情况下,为了使用 RNN 进行预测,我们可以执行以下步骤 -

The basic model, discussed above, of RNN can be extended to other use cases as well. For example, we can use it to predict a sequence of values based on a single input. In this scenario, order to make prediction with RNN we can perform the following steps −

  1. First, to create an initial hidden state and predict the first element in the output sequence, we need to feed an input sample into the neural network.

  2. After that, to produce an updated hidden state and the second element in the output sequence, we need to combine the initial hidden state with the same sample.

  3. At last, to update the hidden state one more time and predict the final element in output sequence, we feed the sample another time.

Predicting sequences

我们已经了解了如何基于序列预测一个值以及如何基于单个值预测序列。现在我们来看一看我们如何为序列预测序列。在这种情况下,为了利用 RNN 做出预测,我们可以执行以下步骤:

As we have seen how to predict a single value based on a sequence and how to predict a sequence based on a single value. Now let’s see how we can predict sequences for sequences. In this scenario, order to make prediction with RNN we can perform the following steps −

  1. First, to create an initial hidden state and predict the first element in the output sequence, we need to take the first element in the input sequence.

  2. After that, to update the hidden state and predict the second element in the output sequence, we need to take the initial hidden state.

  3. At last, to predict the final element in the output sequence, we need to take the updated hidden state and the final element in the input sequence.

Working of RNN

为了理解循环神经网络 (RNN) 的工作原理,我们需要首先了解网络中的循环层的运作机制。因此,我们首先讨论一下如何利用标准循环层预测输出。

To understand the working of recurrent neural networks (RNNs) we need to first understand how recurrent layers in the network work. So first let’s discuss how e can predict the output with a standard recurrent layer.

Predicting output with standard RNN layer

就像我们之前讨论过的那样,RNN 中的一个基本层与神经网络中的一个常规层非常不同。在上一个部分,我们还在图表中演示了 RNN 的基本架构。为了首次更新顺序中步骤的隐藏状态,我们可以使用以下公式:

As we discussed earlier also that a basic layer in RNN is quite different from a regular layer in a neural network. In previous section, we also demonstrated in the diagram the basic architecture of RNN. In order to update the hidden state for the first-time step-in sequence we can use the following formula −

rnn layer

在上一个等式中,我们通过计算初始隐藏状态和一组权重之间的点积来计算新的隐藏状态。

In the above equation, we calculate the new hidden state by calculating the dot product between the initial hidden state and a set of weights.

现在,对于下一步,当前时间步骤的隐藏状态被用作顺序中下一步的初始隐藏状态。正因为如此,为了第二次更新时间步骤的隐藏状态,我们可以重复第一次执行的计算,如下所示:

Now for the next step, the hidden state for the current time step is used as the initial hidden state for the next time step in the sequence. That’s why, to update the hidden state for the second time step, we can repeat the calculations performed in the first-time step as follows −

first step

接下来,我们可以重复针对第三步和顺序中的最后一步更新隐藏状态的过程,如下所示:

Next, we can repeat the process of updating the hidden state for the third and final step in the sequence as below −

last step

当我们在序列中处理完以上所有步骤后,我们可以按照如下计算输出值:

And when we have processed all the above steps in the sequence, we can calculate the output as follows −

output

对于以上公式,我们使用了第三组权重和最终时间步骤隐藏状态。

For the above formula, we have used a third set of weights and the hidden state from the final time step.

Advanced Recurrent Units

基本循环层的主要问题是梯度消失问题,并且由此导致它不善于学习长期关联。简而言之,基本循环层不是很好地处理较长的序列。因此,以下其他一些循环层类型更适合于处理更长的序列:

The main issue with basic recurrent layer is of vanishing gradient problem and due to this it is not very good at learning long-term correlations. In simple words basic recurrent layer does not handle long sequences very well. That’s the reason some other recurrent layer types that are much more suited for working with longer sequences are as follows −

Long-Short Term Memory (LSTM)

long short term memory

Hochreiter和Schmidhuber提出了长短期记忆(LSTMs)网络。它解决了让基本递归层长时间记住事物的问题。LSTM的架构在图中以上给出。正如我们所看到的,它具有输入神经元、记忆细胞和输出神经元。为了解决梯度消失问题,长短期记忆网络使用显式记忆单元(存储先前的值)和以下门-

Long-short term memory (LSTMs) networks were introduced by Hochreiter & Schmidhuber. It solved the problem of getting a basic recurrent layer to remember things for a long time. The architecture of LSTM is given above in the diagram. As we can see it has input neurons, memory cells, and output neurons. In order to combat the vanishing gradient problem, Long-short term memory networks use an explicit memory cell (stores the previous values) and the following gates −

  1. * Forget gate*− As name implies, it tells the memory cell to forget the previous values. The memory cell stores the values until the gate i.e. ‘forget gate’ tells it to forget them.

  2. * Input gate*− As name implies, it adds new stuff to the cell.

  3. * Output gate*− As name implies, output gate decides when to pass along the vectors from the cell to the next hidden state.

Gated Recurrent Units (GRUs)

gated recurrent units

Gradient recurrent units (GRU)是 LSTM 网络的一个小变种。它的门少一个并且连接方式与 LSTM 略有不同。其架构在上图中显示。它具有输入神经元、门控存储单元和输出神经元。门控循环单元网络具有以下两个门:

Gradient recurrent units (GRUs) is a slight variation of LSTMs network. It has one less gate and are wired slightly different than LSTMs. Its architecture is shown in the above diagram. It has input neurons, gated memory cells, and output neurons. Gated Recurrent Units network has the following two gates −

  1. Update gate− It determines the following two things−

  2. * Reset gate*− The functionality of reset gate is much like that of forget gate of LSTMs network. The only difference is that it is located slightly differently.

与长短期记忆网络相比,门控循环单元网络的速度稍快,并且运行起来也更容易。

In contrast to Long-short term memory network, Gated Recurrent Unit networks are slightly faster and easier to run.

Creating RNN structure

在开始预测任何数据源的输出之前,我们需要先构建 RNN,而构建 RNN 与我们在上一节中构建常规神经网络非常相似。下面是构建一个 RNN 的代码−

Before we can start, making prediction about the output from any of our data source, we need to first construct RNN and constructing RNN is quite same as we had build regular neural network in previous section. Following is the code to build one−

from cntk.losses import squared_error
from cntk.io import CTFDeserializer, MinibatchSource, INFINITELY_REPEAT, StreamDefs, StreamDef
from cntk.learners import adam
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
BATCH_SIZE = 14 * 10
EPOCH_SIZE = 12434
EPOCHS = 10

Staking multiple layers

我们还可以在 CNTK 中堆叠多个循环层。例如,我们可以使用以下层组合−

We can also stack multiple recurrent layers in CNTK. For example, we can use the following combination of layers−

from cntk import sequence, default_options, input_variable
from cntk.layers import Recurrence, LSTM, Dropout, Dense, Sequential, Fold
features = sequence.input_variable(1)
with default_options(initial_state = 0.1):
   model = Sequential([
      Fold(LSTM(15)),
      Dense(1)
   ])(features)
target = input_variable(1, dynamic_axes=model.dynamic_axes)

如我们在上面的代码中看到的,我们有以下两种方法可以在 CNTK 中对 RNN 进行建模−

As we can see in the above code, we have the following two ways in which we can model RNN in CNTK −

  1. First, if we only want the final output of a recurrent layer, we can use the Fold layer in combination with a recurrent layer, such as GRU, LSTM, or even RNNStep.

  2. Second, as an alternative way, we can also use the Recurrence block.

Training RNN with time series data

在构建模型后,让我们看看如何在 CNTK 中训练 RNN −

Once we build the model, let’s see how we can train RNN in CNTK −

from cntk import Function
@Function
def criterion_factory(z, t):
   loss = squared_error(z, t)
   metric = squared_error(z, t)
   return loss, metric
loss = criterion_factory(model, target)
learner = adam(model.parameters, lr=0.005, momentum=0.9)

现在要将数据加载到训练过程中,我们必须从一组 CTF 文件中反序列化序列。以下代码有 create_datasource 函数,它是一个有用的实用函数,可用于创建训练和测试数据源。

Now to load the data into the training process, we must have to deserialize sequences from a set of CTF files. Following code have the create_datasource function, which is a useful utility function to create both the training and test datasource.

target_stream = StreamDef(field='target', shape=1, is_sparse=False)
features_stream = StreamDef(field='features', shape=1, is_sparse=False)
deserializer = CTFDeserializer(filename, StreamDefs(features=features_stream, target=target_stream))
   datasource = MinibatchSource(deserializer, randomize=True, max_sweeps=sweeps)
return datasource
train_datasource = create_datasource('Training data filename.ctf')#we need to provide the location of training file we created from our dataset.
test_datasource = create_datasource('Test filename.ctf', sweeps=1) #we need to provide the location of testing file we created from our dataset.

现在,当我们设置好数据源、模型和损失函数时,就可以开始训练过程了。这与我们在上一节中使用基本神经网络时所做的事情非常相似。

Now, as we have setup the data sources, model and the loss function, we can start the training process. It is quite similar as we did in previous sections with basic neural networks.

progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
   features: train_datasource.streams.features,
   target: train_datasource.streams.target
}
history = loss.train(
   train_datasource,
   epoch_size=EPOCH_SIZE,
   parameter_learners=[learner],
   model_inputs_to_streams=input_map,
   callbacks=[progress_writer, test_config],
   minibatch_size=BATCH_SIZE,
   max_epochs=EPOCHS
)

我们将获得如下所示的输出:

We will get the output similar as follows −

Output−

average  since  average  since  examples
loss      last  metric  last
------------------------------------------------------
Learning rate per minibatch: 0.005
0.4      0.4    0.4      0.4      19
0.4      0.4    0.4      0.4      59
0.452    0.495  0.452    0.495   129
[…]

Validating the model

实际上,使用 RNN 进行预测与使用任何其他 CNK 模型进行预测非常相似。唯一的区别是,我们需要提供序列而不是单个样本。

Actually redicting with a RNN is quite similar to making predictions with any other CNK model. The only difference is that, we need to provide sequences rather than single samples.

现在,由于我们的 RNN 终于完成了训练,我们可以使用一些样本序列进行测试,从而验证模型,如下所示−

Now, as our RNN is finally done with training, we can validate the model by testing it using a few samples sequence as follows −

import pickle
with open('test_samples.pkl', 'rb') as test_file:
test_samples = pickle.load(test_file)
model(test_samples) * NORMALIZE

Output−

array([[ 8081.7905],
[16597.693 ],
[13335.17 ],
...,
[11275.804 ],
[15621.697 ],
[16875.555 ]], dtype=float32)