Keras 简明教程

Keras - Model Compilation

我们之前已经学习了如何使用顺序 API 和函数 API 来创建模型的基础。本章将讲解如何编译模型。编译是创建模型的最后一步。编译完成后,我们就可以继续进入训练阶段了。

Previously, we studied the basics of how to create model using Sequential and Functional API. This chapter explains about how to compile the model. The compilation is the final step in creating a model. Once the compilation is done, we can move on to training phase.


Let us learn few concepts required to better understand the compilation process.


在机器学习中, Loss 函数用于找出学习过程中的错误或偏差。Keras 在模型编译过程中需要损失函数。

In machine learning, Loss function is used to find error or deviation in the learning process. Keras requires loss function during model compilation process.

Keras 在 losses 模块中提供了很多损失函数,如下所示:

Keras provides quite a few loss function in the losses module and they are as follows −

  1. mean_squared_error

  2. mean_absolute_error

  3. mean_absolute_percentage_error

  4. mean_squared_logarithmic_error

  5. squared_hinge

  6. hinge

  7. categorical_hinge

  8. logcosh

  9. huber_loss

  10. categorical_crossentropy

  11. sparse_categorical_crossentropy

  12. binary_crossentropy

  13. kullback_leibler_divergence

  14. poisson

  15. cosine_proximity

  16. is_categorical_crossentropy


All above loss function accepts two arguments −

  1. y_true − true labels as tensors

  2. y_pred − prediction with same shape as y_true


Import the losses module before using loss function as specified below −

from keras import losses


在机器学习中, Optimization 是一个通过比较预测与损失函数来优化输入权重的重要过程。Keras 提供了很多优化模块,即优化器,如下所示:

In machine learning, Optimization is an important process which optimize the input weights by comparing the prediction and the loss function. Keras provides quite a few optimizer as a module, optimizers and they are as follows:

SGD − 随机梯度下降优化器。

SGD − Stochastic gradient descent optimizer.

keras.optimizers.SGD(learning_rate = 0.01, momentum = 0.0, nesterov = False)

RMSprop − RMSProp 优化器。

RMSprop − RMSProp optimizer.

keras.optimizers.RMSprop(learning_rate = 0.001, rho = 0.9)

Adagrad − Adagrad 优化器。

Adagrad − Adagrad optimizer.

keras.optimizers.Adagrad(learning_rate = 0.01)

Adadelta − Adadelta 优化器。

Adadelta − Adadelta optimizer.

keras.optimizers.Adadelta(learning_rate = 1.0, rho = 0.95)

Adam − Adam 优化器。

Adam − Adam optimizer.

   learning_rate = 0.001, beta_1 = 0.9, beta_2 = 0.999, amsgrad = False

Adamax − 来自 Adam 的 Adamax 优化器。

Adamax − Adamax optimizer from Adam.

keras.optimizers.Adamax(learning_rate = 0.002, beta_1 = 0.9, beta_2 = 0.999)

Nadam − Nesterov Adam 优化器。

Nadam − Nesterov Adam optimizer.

keras.optimizers.Nadam(learning_rate = 0.002, beta_1 = 0.9, beta_2 = 0.999)


Import the optimizers module before using optimizers as specified below −

from keras import optimizers


在机器学习中, Metrics 用于评估你模型的性能。它与损失函数类似,但不用于训练过程中。Keras 提供了很多指标模块,即 metrics ,如下所示:

In machine learning, Metrics is used to evaluate the performance of your model. It is similar to loss function, but not used in training process. Keras provides quite a few metrics as a module, metrics and they are as follows

  1. accuracy

  2. binary_accuracy

  3. categorical_accuracy

  4. sparse_categorical_accuracy

  5. top_k_categorical_accuracy

  6. sparse_top_k_categorical_accuracy

  7. cosine_proximity

  8. clone_metric

同损失函数类似,指标也接受以下两个参数 −

Similar to loss function, metrics also accepts below two arguments −

  1. y_true − true labels as tensors

  2. y_pred − prediction with same shape as y_true

在使用指标之前,导入指标模块,如下所示 −

Import the metrics module before using metrics as specified below −

from keras import metrics

Compile the model

Keras 模型提供 compile() 方法来编译模型。 compile() 方法的参数和默认值为以下内容

Keras model provides a method, compile() to compile the model. The argument and default value of the compile() method is as follows

   loss = None,
   metrics = None,
   loss_weights = None,
   sample_weight_mode = None,
   weighted_metrics = None,
   target_tensors = None

重要参数如下 −

The important arguments are as follows −

  1. loss function

  2. Optimizer

  3. metrics

编译模式的示例代码如下 −

A sample code to compile the mode is as follows −

from keras import losses
from keras import optimizers
from keras import metrics

model.compile(loss = 'mean_squared_error',
   optimizer = 'sgd', metrics = [metrics.categorical_accuracy])



  1. loss function is set as mean_squared_error

  2. optimizer is set as sgd

  3. metrics is set as metrics.categorical_accuracy

Model Training

模型通过 fit() 使用 NumPy 数组进行训练。此拟合函数的主要目的是根据训练评估您的模型。它还可以用于绘制模型性能。它具有以下语法 −

Models are trained by NumPy arrays using fit(). The main purpose of this fit function is used to evaluate your model on training. This can be also used for graphing model performance. It has the following syntax −, y, epochs = , batch_size = )



  1. X, y − It is a tuple to evaluate your data.

  2. epochs − no of times the model is needed to be evaluated during training.

  3. batch_size − training instances.

让我们以上述概念为例,使用 numpy 随机数据。

Let us take a simple example of numpy random data to use this concept.

Create data

让我们借助下面提到的命令使用 numpy 为 x 和 y 创建随机数据 −

Let us create a random data using numpy for x and y with the help of below mentioned command −

import numpy as np

x_train = np.random.random((100,4,8))
y_train = np.random.random((100,10))


Now, create random validation data,

x_val = np.random.random((100,4,8))
y_val = np.random.random((100,10))

Create model

让我们创建简单的顺序模型 −

Let us create simple sequential model −

from keras.models import Sequential model = Sequential()

Add layers

创建图层以添加模型 −

Create layers to add model −

from keras.layers import LSTM, Dense

# add a sequence of vectors of dimension 16
model.add(LSTM(16, return_sequences = True))
model.add(Dense(10, activation = 'softmax'))

compile model

现在模型已定义。您可以使用以下命令进行编译 −

Now model is defined. You can compile using the below command −

   loss = 'categorical_crossentropy', optimizer = 'sgd', metrics = ['accuracy']

Apply fit()

现在,我们应用 fit() 函数来训练我们的数据 −

Now we apply fit() function to train our data −, y_train, batch_size = 32, epochs = 5, validation_data = (x_val, y_val))

Create a Multi-Layer Perceptron ANN

我们已经学会创建、编译和训练 Keras 模型。

We have learned to create, compile and train the Keras models.

让我们应用所学知识并创建一个基于 MPL 的简单神经网络。

Let us apply our learning and create a simple MPL based ANN.

Dataset module

在创建一个模型之前,我们需要选择一个问题,需要收集所需数据,并将数据转换为 NumPy 数组。一旦收集到数据,我们就可以准备模型并使用收集的数据对其进行训练。数据收集是机器学习中最困难的阶段之一。Keras 提供了一个特殊模块数据集,用于下载在线机器学习数据以用于训练目的。它从在线服务器获取数据,处理数据,并将数据作为训练和测试集返回。让我们检查 Keras 数据集模块提供的数据。此模块中提供的数据如下:

Before creating a model, we need to choose a problem, need to collect the required data and convert the data to NumPy array. Once data is collected, we can prepare the model and train it by using the collected data. Data collection is one of the most difficult phase of machine learning. Keras provides a special module, datasets to download the online machine learning data for training purposes. It fetches the data from online server, process the data and return the data as training and test set. Let us check the data provided by Keras dataset module. The data available in the module are as follows,

  1. CIFAR10 small image classification

  2. CIFAR100 small image classification

  3. IMDB Movie reviews sentiment classification

  4. Reuters newswire topics classification

  5. MNIST database of handwritten digits

  6. Fashion-MNIST database of fashion articles

  7. Boston housing price regression dataset

让我们使用 MNIST database of handwritten digits (或 minst)作为我们的输入。minst 是 60,000 个 28x28 灰度图像的集合。它包含 10 位数字。它还包含 10,000 个测试图像。

Let us use the MNIST database of handwritten digits (or minst) as our input. minst is a collection of 60,000, 28x28 grayscale images. It contains 10 digits. It also contains 10,000 test images.

可以使用以下代码加载数据集 −

Below code can be used to load the dataset −

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()



  1. Line 1 imports minst from the keras dataset module.

  2. Line 3 calls the load_data function, which will fetch the data from online server and return the data as 2 tuples, First tuple, (x_train, y_train) represent the training data with shape, (number_sample, 28, 28) and its digit label with shape, (number_samples, ). Second tuple, (x_test, y_test) represent test data with same shape.

还可以使用类似的 API 获取其他数据集,每个 API 也返回类似的数据,但数据形状除外。数据的形状取决于数据类型。

Other dataset can also be fetched using similar API and every API returns similar data as well except the shape of the data. The shape of the data depends on the type of data.

Create a model

让我们选择下面表示的简单多层感知器 (MLP),并尝试使用 Keras 创建该模型。

Let us choose a simple multi-layer perceptron (MLP) as represented below and try to create the model using Keras.

create model


The core features of the model are as follows −

  1. Input layer consists of 784 values (28 x 28 = 784).

  2. First hidden layer, Dense consists of 512 neurons and ‘relu’ activation function.

  3. Second hidden layer, Dropout has 0.2 as its value.

  4. Third hidden layer, again Dense consists of 512 neurons and ‘relu’ activation function.

  5. Fourth hidden layer, Dropout has 0.2 as its value.

  6. Fifth and final layer consists of 10 neurons and ‘softmax’ activation function.

  7. Use categorical_crossentropy as loss function.

  8. Use RMSprop() as Optimizer.

  9. Use accuracy as metrics.

  10. Use 128 as batch size.

  11. Use 20 as epochs.

Step 1 − Import the modules

Step 1 − Import the modules


Let us import the necessary modules.

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
import numpy as np

Step 2 − Load data

Step 2 − Load data

让我们导入 mnist 数据集。

Let us import the mnist dataset.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Step 3 − Process the data

Step 3 − Process the data


Let us change the dataset according to our model, so that it can be feed into our model.

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)



  1. reshape is used to reshape the input from (28, 28) tuple to (784, )

  2. to_categorical is used to convert vector to binary matrix

Step 4 − Create the model

Step 4 − Create the model


Let us create the actual model.

model = Sequential()
model.add(Dense(512, activation = 'relu', input_shape = (784,)))
model.add(Dense(512, activation = 'relu'))
model.add(Dense(10, activation = 'softmax'))

Step 5 − Compile the model

Step 5 − Compile the model


Let us compile the model using selected loss function, optimizer and metrics.

model.compile(loss = 'categorical_crossentropy',
   optimizer = RMSprop(),
   metrics = ['accuracy'])

Step 6 − Train the model

Step 6 − Train the model

让我们使用 fit() 方法训练模型。

Let us train the model using fit() method.

history =
   x_train, y_train,
   batch_size = 128,
   epochs = 20,
   verbose = 1,
   validation_data = (x_test, y_test)

Final thoughts


We have created the model, loaded the data and also trained the data to the model. We still need to evaluate the model and predict output for unknown input, which we learn in upcoming chapter.

执行该应用程序将给出以下内容作为输出 −

Executing the application will give the below content as output −

Train on 60000 samples, validate on 10000 samples Epoch 1/20
60000/60000 [==============================] - 7s 118us/step - loss: 0.2453
- acc: 0.9236 - val_loss: 0.1004 - val_acc: 0.9675 Epoch 2/20
60000/60000 [==============================] - 7s 110us/step - loss: 0.1023
- acc: 0.9693 - val_loss: 0.0797 - val_acc: 0.9761 Epoch 3/20
60000/60000 [==============================] - 7s 110us/step - loss: 0.0744
- acc: 0.9770 - val_loss: 0.0727 - val_acc: 0.9791 Epoch 4/20
60000/60000 [==============================] - 7s 110us/step - loss: 0.0599
- acc: 0.9823 - val_loss: 0.0704 - val_acc: 0.9801 Epoch 5/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0504
- acc: 0.9853 - val_loss: 0.0714 - val_acc: 0.9817 Epoch 6/20
60000/60000 [==============================] - 7s 111us/step - loss: 0.0438
- acc: 0.9868 - val_loss: 0.0845 - val_acc: 0.9809 Epoch 7/20
60000/60000 [==============================] - 7s 114us/step - loss: 0.0391
- acc: 0.9887 - val_loss: 0.0823 - val_acc: 0.9802 Epoch 8/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0364
- acc: 0.9892 - val_loss: 0.0818 - val_acc: 0.9830 Epoch 9/20
60000/60000 [==============================] - 7s 113us/step - loss: 0.0308
- acc: 0.9905 - val_loss: 0.0833 - val_acc: 0.9829 Epoch 10/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0289
- acc: 0.9917 - val_loss: 0.0947 - val_acc: 0.9815 Epoch 11/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0279
- acc: 0.9921 - val_loss: 0.0818 - val_acc: 0.9831 Epoch 12/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0260
- acc: 0.9927 - val_loss: 0.0945 - val_acc: 0.9819 Epoch 13/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0257
- acc: 0.9931 - val_loss: 0.0952 - val_acc: 0.9836 Epoch 14/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0229
- acc: 0.9937 - val_loss: 0.0924 - val_acc: 0.9832 Epoch 15/20
60000/60000 [==============================] - 7s 115us/step - loss: 0.0235
- acc: 0.9937 - val_loss: 0.1004 - val_acc: 0.9823 Epoch 16/20
60000/60000 [==============================] - 7s 113us/step - loss: 0.0214
- acc: 0.9941 - val_loss: 0.0991 - val_acc: 0.9847 Epoch 17/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0219
- acc: 0.9943 - val_loss: 0.1044 - val_acc: 0.9837 Epoch 18/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0190
- acc: 0.9952 - val_loss: 0.1129 - val_acc: 0.9836 Epoch 19/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0197
- acc: 0.9953 - val_loss: 0.0981 - val_acc: 0.9841 Epoch 20/20
60000/60000 [==============================] - 7s 112us/step - loss: 0.0198
- acc: 0.9950 - val_loss: 0.1215 - val_acc: 0.9828