Microsoft Cognitive Toolkit 简明教程

CNTK - Neural Network Regression

本章将帮助您了解神经网络回归与 CNTK 的关系。

The chapter will help you understand the neural network regression with regards to CNTK.

Introduction

正如我们所知,为了从一个或多个预测变量中预测一个数字值,我们使用回归。让我们举一个预测 100 个城镇中某个城镇的房价中位数的示例。为此,我们有包括以下内容的数据 −

As we know that, in order to predict a numeric value from one or more predictor variables, we use regression. Let’s take an example of predicting the median value of a house in say one of the 100 towns. To do so, we have data that includes −

  1. A crime statistic for each town.

  2. The age of the houses in each town.

  3. A measure of the distance from each town to a prime location.

  4. The student-to-teacher ratio in each town.

  5. A racial demographic statistic for each town.

  6. The median house value in each town.

根据这五个预测变量,我们想预测房屋价值中位数。为此,我们可以构建如下形式的线性回归模型−

Based on these five predictor variables, we would like to predict median house value. And for this we can create a linear regression model along the lines of−

Y = a0+a1(crime)+a2(house-age)+(a3)(distance)+(a4)(ratio)+(a5)(racial)

在上述方程式中−

In the above equation −

Y 是预测的房屋价值中位数

Y is a predicted median value

*a*0 是一个常数且

*a*0 is a constant and

*a*1 到 *a*5 都是与我们上面讨论的五个预测变量相关联的常数。

*a*1 through *a*5 all are constants associated with the five predictors we discussed above.

我们还有另一种方法是使用神经网络。它将创建更准确的预测模型。

We also have an alternate approach of using a neural network. It will create more accurate prediction model.

这里,我们将使用 CNTK 创建神经网络回归模型。

Here, we will be creating a neural network regression model by using CNTK.

Loading Dataset

为实现使用 CNTK 的神经网络回归,我们将使用波士顿地区房屋价值数据集。该数据集可从 UCI 机器学习资料库下载,网址为 https://archive.ics.uci.edu/ 。此数据集共有 14 个变量和 506 个实例。

To implement Neural Network regression using CNTK, we will be using Boston area house values dataset. The dataset can be downloaded from UCI Machine Learning Repository which is available at https://archive.ics.uci.edu/. This dataset has total 14 variables and 506 instances.

但对于我们的实现程序,我们将使用 14 个变量中的 6 个和 100 个实例。在 6 个变量中,5 个作为预测变量,1 个作为待预测值。在 100 个实例中,我们将使用 80 个进行训练,20 个进行测试目的。我们要预测的值是城镇的房屋价格中位数。我们来看看我们将使用的五个预测变量−

But, for our implementation program we are going to use six of the 14 variables and 100 instances. Out of 6, 5 as predictors and one as a value-to-predict. From 100 instances, we will be using 80 for training and 20 for testing purpose. The value which we want to predict is the median house price in a town. Let’s see the five predictors we will be using −

  1. * Crime per capita in the town* − We would expect smaller values to be associated with this predictor.

  2. * Proportion of owner* − occupied units built before 1940 - We would expect smaller values to be associated with this predictor because larger value means older house.

  3. * Weighed distance of the town to five Boston employment centers.*

  4. * Area school pupil-to-teacher ratio.*

  5. * An indirect metric of the proportion of black residents in the town.*

Preparing training & test files

像我们之前所做的那样,我们首先需要将原始数据转换为 CNTK 格式。我们准备使用前 80 个数据项进行训练目的,因此,基于制表符分隔的 CNTK 格式如下−

As we did before, first we need to convert the raw data into CNTK format. We are going to use first 80 data items for training purpose, so the tab-delimited CNTK format is as follows −

|predictors 1.612820 96.90 3.76 21.00 248.31 |medval 13.50
|predictors 0.064170 68.20 3.36 19.20 396.90 |medval 18.90
|predictors 0.097440 61.40 3.38 19.20 377.56 |medval 20.00
. . .

将接下来的 20 个项目也转换成 CNTK 格式,将用于测试目的。

Next 20 items, also converted into CNTK format, will used for testing purpose.

Constructing Regression model

首先,我们需要处理 CNTK 格式的数据文件,为此,我们将使用 create_reader 名为辅助函数,如下所示 −

First, we need to process the data files in CNTK format and for that, we are going to use the helper function named create_reader as follows −

def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='predictors', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='medval', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src

其次,我们需要创建一个辅助函数,该函数接受一个 CNTK 迷你批处理对象并计算一个自定义准确率度量。

Next, we need to create a helper function that accepts a CNTK mini-batch object and computes a custom accuracy metric.

def mb_accuracy(mb, x_var, y_var, model, delta):
   num_correct = 0
   num_wrong = 0
   x_mat = mb[x_var].asarray()
   y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
  v = model.eval(x_mat[i])
  y = y_mat[i]
if np.abs(v[0,0] – y[0,0]) < delta:
   num_correct += 1
else:
   num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)

现在,我们需要为我们的 NN 设置架构参数,还要提供数据文件的位置。这可以通过以下 Python 代码来完成 −

Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code −

def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 5
hidden_dim = 20
output_dim = 1
train_file = ".\\...\\" #provide the name of the training file(80 data items)
test_file = ".\\...\\" #provide the name of the test file(20 data items)

现在,我们的程序将在以下代码行的帮助下创建未训练的 NN −

Now, with the help of following code line our program will create the untrained NN −

X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
model = C.ops.alias(oLayer)

现在,一旦我们创建了双重未训练模型,我们需要设置一个学习算法对象。我们将使用 SGD 学习器和 squared_error 损失函数 −

Now, once we have created the dual untrained model, we need to set up a Learner algorithm object. We are going to use SGD learner and squared_error loss function −

tr_loss = C.squared_error(model, Y)
max_iter = 3000
batch_size = 5
base_learn_rate = 0.02
sch=C.learning_parameter_schedule([base_learn_rate, base_learn_rate/2], minibatch_size=batch_size, epoch_size=int((max_iter*batch_size)/2))
learner = C.sgd(model.parameters, sch)
trainer = C.Trainer(model, (tr_loss), [learner])

现在,一旦我们完成学习算法对象,我们需要创建一个读取函数来读取训练数据 −

Now, once we finish with Learning algorithm object, we need to create a reader function to read the training data −

rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }

现在,是时候训练我们的 NN 模型了 −

Now, it’s time to train our NN model −

for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=boston_input_map) trainer.train_minibatch(curr_batch)
if i % int(max_iter/10) == 0:
mcee = trainer.previous_minibatch_loss_average
acc = mb_accuracy(curr_batch, X, Y, model, delta=3.00)
print("batch %4d: mean squared error = %8.4f, accuracy = %5.2f%% " \ % (i, mcee, acc))

一旦我们完成了训练,让我们使用测试数据项对模型进行评估 −

Once we have done with training, let’s evaluate the model using test data items −

print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=boston_input_map)
acc = mb_accuracy(all_test, X, Y, model, delta=3.00)
print("Prediction accuracy = %0.2f%%" % acc)

评估了我们训练的 NN 模型的准确性后,我们将使用它对未见数据进行预测 −

After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data −

np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])

Complete Regression Model

import numpy as np
import cntk as C
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='predictors', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='medval', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def mb_accuracy(mb, x_var, y_var, model, delta):
num_correct = 0
num_wrong = 0
x_mat = mb[x_var].asarray()
y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
   v = model.eval(x_mat[i])
   y = y_mat[i]
if np.abs(v[0,0] – y[0,0]) < delta:
   num_correct += 1
else:
   num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 5
hidden_dim = 20
output_dim = 1
train_file = ".\\...\\" #provide the name of the training file(80 data items)
test_file = ".\\...\\" #provide the name of the test file(20 data items)
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
model = C.ops.alias(oLayer)
tr_loss = C.squared_error(model, Y)
max_iter = 3000
batch_size = 5
base_learn_rate = 0.02
sch = C.learning_parameter_schedule([base_learn_rate, base_learn_rate/2], minibatch_size=batch_size, epoch_size=int((max_iter*batch_size)/2))
learner = C.sgd(model.parameters, sch)
trainer = C.Trainer(model, (tr_loss), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=boston_input_map) trainer.train_minibatch(curr_batch)
if i % int(max_iter/10) == 0:
   mcee = trainer.previous_minibatch_loss_average
   acc = mb_accuracy(curr_batch, X, Y, model, delta=3.00)
   print("batch %4d: mean squared error = %8.4f, accuracy = %5.2f%% " \ % (i, mcee, acc))
   print("\nEvaluating test data \n")
   rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
   boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
   num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=boston_input_map)
acc = mb_accuracy(all_test, X, Y, model, delta=3.00)
print("Prediction accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])
if __name__== ”__main__”:
   main()

Output

Using CNTK version = 2.7
batch 0: mean squared error = 385.6727, accuracy = 0.00%
batch 300: mean squared error = 41.6229, accuracy = 20.00%
batch 600: mean squared error = 28.7667, accuracy = 40.00%
batch 900: mean squared error = 48.6435, accuracy = 40.00%
batch 1200: mean squared error = 77.9562, accuracy = 80.00%
batch 1500: mean squared error = 7.8342, accuracy = 60.00%
batch 1800: mean squared error = 47.7062, accuracy = 60.00%
batch 2100: mean squared error = 40.5068, accuracy = 40.00%
batch 2400: mean squared error = 46.5023, accuracy = 40.00%
batch 2700: mean squared error = 15.6235, accuracy = 60.00%
Evaluating test data
Prediction accuracy = 64.00%
Predicting median home value for feature/predictor values:
[0.09 50. 4.5 17. 350.]
Predicted value is:
$21.02(x1000)

Saving the trained model

此波士顿房屋价值数据集仅有 506 个数据项(其中我们仅使用了 100 个)。因此,训练 NN 回归模型只需几秒钟,但在一个拥有数百或数千个数据项的大型数据集上进行训练可能需要数小时甚至数天。

This Boston Home value dataset has only 506 data items (among which we sued only 100). Hence, it would take only a few seconds to train the NN regressor model, but training on a large dataset having hundred or thousand data items can take hours or even days.

我们可以保存我们的模型,这样我们就不必从头开始保留它。在以下 Python 代码的帮助下,我们可以保存我们训练后的 NN −

We can save our model, so that we won’t have to retain it from scratch. With the help of following Python code, we can save our trained NN −

nn_regressor = “.\\neuralregressor.model” #provide the name of the file
model.save(nn_regressor, format=C.ModelFormat.CNTKv2)

以下是上面使用的 save() 函数的参数 −

Following are the arguments of save() function used above −

  1. File name is the first argument of save() function. It can also be written along with the path of file.

  2. Another parameter is the format parameter which has a default value C.ModelFormat.CNTKv2.

Loading the trained model

一旦您保存了训练后的模型,加载该模型就非常容易了。我们只需要使用 load () 函数。让我们在以下示例中检查这一点 −

Once you saved the trained model, it’s very easy to load that model. We only need to use the load () function. Let’s check this in following example −

import numpy as np
import cntk as C
model = C.ops.functions.Function.load(“.\\neuralregressor.model”)
np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting area median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])

保存模型的好处是,一旦您加载保存的模型,就可以像模型刚刚训练过一样使用它。

The benefit of saved model is that once you load a saved model, it can be used exactly as if the model had just been trained.