Python Deep Learning 简明教程
Python Deep Learning - Introduction
深度结构学习或分层学习,简称深度学习,是机器学习方法的一个分支部分本身是更广泛的人工智能领域的子集。
Deep structured learning or hierarchical learning or deep learning in short is part of the family of machine learning methods which are themselves a subset of the broader field of Artificial Intelligence.
深度学习是一类机器学习算法,它使用多层非线性处理单元进行特征提取和转换。每一后续层使用前一层的输出作为输入。
Deep learning is a class of machine learning algorithms that use several layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input.
深度神经网络、深度信念网络和递归神经网络已应用于计算机视觉、语音识别、自然语言处理、音频识别、社交网络过滤、机器翻译和生物信息学等领域,在这些领域中它们产生的结果可与专家相比,在某些情况下甚至优于专家。
Deep neural networks, deep belief networks and recurrent neural networks have been applied to fields such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bioinformatics where they produced results comparable to and in some cases better than human experts have.
深度学习算法和网络−
Deep Learning Algorithms and Networks −
-
are based on the unsupervised learning of multiple levels of features or representations of the data. Higher-level features are derived from lower level features to form a hierarchical representation.
-
use some form of gradient descent for training.
Python Deep Learning - Environment
在本章中,我们将了解 Python 深度学习的环境设置。我们必须安装以下软件来制作深度学习算法。
In this chapter, we will learn about the environment set up for Python Deep Learning. We have to install the following software for making deep learning algorithms.
-
Python 2.7+
-
Scipy with Numpy
-
Matplotlib
-
Theano
-
Keras
-
TensorFlow
强烈建议通过 Anaconda 发行版安装 Python、NumPy、SciPy 和 Matplotlib。它包含所有这些软件包。
It is strongly recommend that Python, NumPy, SciPy, and Matplotlib are installed through the Anaconda distribution. It comes with all of those packages.
我们需要确保不同类型的软件都已正确安装。
We need to ensure that the different types of software are installed properly.
让我们转到我们的命令行程序,然后键入以下命令−
Let us go to our command line program and type in the following command −
$ python
Python 3.6.3 |Anaconda custom (32-bit)| (default, Oct 13 2017, 14:21:34)
[GCC 7.2.0] on linux
接下来,我们可以导入所需的类库并打印出它们的版本 −
Next, we can import the required libraries and print their versions −
import numpy
print numpy.__version__
Installation of Theano, TensorFlow and Keras
在我们开始安装软件包 − Theano、TensorFlow 和 Keras 之前,我们需要确认是否已安装 pip 。Anaconda 中的软件包管理系统称为 pip。
Before we begin with the installation of the packages − Theano, TensorFlow and Keras, we need to confirm if the pip is installed. The package management system in Anaconda is called the pip.
若要确认已安装 pip,请在命令行中键入以下内容 −
To confirm the installation of pip, type the following in the command line −
$ pip
确认已安装 pip 后,我们可以执行以下命令安装 TensorFlow 和 Keras −
Once the installation of pip is confirmed, we can install TensorFlow and Keras by executing the following command −
$pip install theano
$pip install tensorflow
$pip install keras
通过执行以下代码行确认已安装 Theano −
Confirm the installation of Theano by executing the following line of code −
$python –c “import theano: print (theano.__version__)”
Output
1.0.1
通过执行以下代码行确认已安装 Tensorflow −
Confirm the installation of Tensorflow by executing the following line of code −
$python –c “import tensorflow: print tensorflow.__version__”
Python Deep Basic Machine Learning
人工智能 (AI) 是一种让计算机模仿人类认知行为或智能的代码、算法或技术。机器学习 (ML) 是 AI 的一个子集,它使用统计方法使机器能够通过经验进行学习和改进。深度学习是机器学习的一个子集,它让多层神经网络的计算变得可行。机器学习被视为浅层学习,而深度学习被视为具有抽象功能的分层学习。
Artificial Intelligence (AI) is any code, algorithm or technique that enables a computer to mimic human cognitive behaviour or intelligence. Machine Learning (ML) is a subset of AI that uses statistical methods to enable machines to learn and improve with experience. Deep Learning is a subset of Machine Learning, which makes the computation of multi-layer neural networks feasible. Machine Learning is seen as shallow learning while Deep Learning is seen as hierarchical learning with abstraction.
机器学习涉及广泛的概念。概念如下所示 −
Machine learning deals with a wide range of concepts. The concepts are listed below −
-
supervised
-
unsupervised
-
reinforcement learning
-
linear regression
-
cost functions
-
overfitting
-
under-fitting
-
hyper-parameter, etc.
在监督学习中,我们学习根据标记数据预测值。一种能在此方面提供帮助的 ML 技术是分类,其中目标值是不连续的值;例如,猫和狗。机器学习中的另一种可能提供帮助的技术是回归。回归根据目标值工作。目标值是连续值;例如,可以使用回归来分析股市数据。
In supervised learning, we learn to predict values from labelled data. One ML technique that helps here is classification, where target values are discrete values; for example,cats and dogs. Another technique in machine learning that could come of help is regression. Regression works onthe target values. The target values are continuous values; for example, the stock market data can be analysed using Regression.
在无监督学习中,我们从未标记或未结构化的输入数据中进行推理。如果我们有数百万份医疗记录,并且我们必须理解它,找出其基本结构、异常值或检测异常情况,那么我们会使用聚类技术将数据分成广泛的群集。
In unsupervised learning, we make inferences from the input data that is not labelled or structured. If we have a million medical records and we have to make sense of it, find the underlying structure, outliers or detect anomalies, we use clustering technique to divide data into broad clusters.
数据集分为训练集、测试集、验证集等。
Data sets are divided into training sets, testing sets, validation sets and so on.
2012 年的一项突破使深度学习概念变得突出。一种算法使用 2 个 GPU 和大数据等最新技术将 100 万张图像成功分类为 1,000 个类别。
A breakthrough in 2012 brought the concept of Deep Learning into prominence. An algorithm classified 1 million images into 1000 categories successfully using 2 GPUs and latest technologies like Big Data.
Relating Deep Learning and Traditional Machine Learning
传统机器学习模型遇到的一个主要挑战是一个称为特征提取的过程。程序员需要具体说明并告诉计算机要寻找的特征。这些特征将帮助做出决策。
One of the major challenges encountered in traditional machine learning models is a process called feature extraction. The programmer needs to be specific and tell the computer the features to be looked out for. These features will help in making decisions.
将原始数据输入算法很少能够起作用,因此特征提取是传统机器学习工作流程的关键部分。
Entering raw data into the algorithm rarely works, so feature extraction is a critical part of the traditional machine learning workflow.
这给程序员带来了巨大的责任,而且算法的效率在很大程度上取决于程序员的创造力。对于对象识别或手写识别等复杂问题,这是一个巨大的问题。
This places a huge responsibility on the programmer, and the algorithm’s efficiency relies heavily on how inventive the programmer is. For complex problems such as object recognition or handwriting recognition, this is a huge issue.
深度学习具有学习多层表示的能力,是少数能够帮助我们进行自动特征提取的方法之一。可以假定较低层可以执行自动特征提取,几乎不需要程序员的指导。
Deep learning, with the ability to learn multiple layers of representation, is one of the few methods that has help us with automatic feature extraction. The lower layers can be assumed to be performing automatic feature extraction, requiring little or no guidance from the programmer.
Artificial Neural Networks
人工神经网络,或简称神经网络,并不是一个新概念。它已经存在了大约 80 年。
The Artificial Neural Network, or just neural network for short, is not a new idea. It has been around for about 80 years.
直到 2011 年,深度神经网络才开始因使用新技术、可用的巨大数据集和功能强大的计算机而变得流行。
It was not until 2011, when Deep Neural Networks became popular with the use of new techniques, huge dataset availability, and powerful computers.
神经网络模仿神经元,它具有树突、细胞核、轴突和末梢轴突。
A neural network mimics a neuron, which has dendrites, a nucleus, axon, and terminal axon.
对于一个网络,我们需要两个神经元。这些神经元通过一个树突和另一个末梢轴突之间的突触来传输信息。
For a network, we need two neurons. These neurons transfer information via synapse between the dendrites of one and the terminal axon of another.
人工神经元的可能模型如下所示 −
A probable model of an artificial neuron looks like this −
神经网络将如下所示 −
A neural network will look like as shown below −
圆圈是神经元或节点,其在数据上的功能和连接它们的线/边是权重/信息正在传递。
The circles are neurons or nodes, with their functions on the data and the lines/edges connecting them are the weights/information being passed along.
每一列都是一层。数据的第一个层是输入层。然后,输入层和输出层之间的所有层都是隐藏层。
Each column is a layer. The first layer of your data is the input layer. Then, all the layers between the input layer and the output layer are the hidden layers.
如果你有一个或几个隐藏层,那么你就拥有了一个浅层神经网络。如果你有许多隐藏层,那么你就拥有了一个深层神经网络。
If you have one or a few hidden layers, then you have a shallow neural network. If you have many hidden layers, then you have a deep neural network.
在这个模型中,你有输入数据,对其进行加权,并通过称为阈值函数或激活函数的神经元中的函数进行传递。
In this model, you have input data, you weight it, and pass it through the function in the neuron that is called threshold function or activation function.
基本上,它是所有值的总和,在将其与某个值进行比较之后。如果你发出信号,那么结果是 (1) 输出,或没有发出任何信号,则 (0)。然后对其进行加权并传递给下一个神经元,并运行相同类型的函数。
Basically, it is the sum of all of the values after comparing it with a certain value. If you fire a signal, then the result is (1) out, or nothing is fired out, then (0). That is then weighted and passed along to the next neuron, and the same sort of function is run.
我们可以将 sigmoid(s 形)函数作为激活函数。
We can have a sigmoid (s-shape) function as the activation function.
至于权重,它们一开始只是随机的,并且对于进入节点/神经元的每个输入都是唯一的。
As for the weights, they are just random to start, and they are unique per input into the node/neuron.
在一个典型的“前馈”中,最基本类型的神经网络,你的信息会直接通过你创建的网络,并将输出与你希望使用你的样本数据的输出进行比较。
In a typical "feed forward", the most basic type of neural network, you have your information pass straight through the network you created, and you compare the output to what you hoped the output would have been using your sample data.
从这里开始,你需要调整权重以帮助你获取与所需输出匹配的输出。
From here, you need to adjust the weights to help you get your output to match your desired output.
将数据直接通过神经网络发送的行为称为 feed forward neural network.
The act of sending data straight through a neural network is called a feed forward neural network.
我们的数据按顺序从输入进入层,然后进入输出。
Our data goes from input, to the layers, in order, then to the output.
当我们向后进行并开始调整权重以最大程度地减少损失/成本时,这称为 back propagation.
When we go backwards and begin adjusting weights to minimize loss/cost, this is called back propagation.
这是一个 optimization problem. 在实际应用中,凭借神经网络,我们必须处理数十万个变量,或数百万,甚至更多变量。
This is an optimization problem. With the neural network, in real practice, we have to deal with hundreds of thousands of variables, or millions, or more.
第一个解决方案是使用随机梯度下降作为优化方法。现在,有 AdaGrad、Adam 优化器等选项。无论哪种方式,这都是一个巨大的计算操作。这就是为什么神经网络在半个世纪的大部分时间里都被搁置的原因。直到最近,我们的机器才拥有了权力和结构,可以考虑执行这些操作,以及相匹配的适当规模的数据集。
The first solution was to use stochastic gradient descent as optimization method. Now, there are options like AdaGrad, Adam Optimizer and so on. Either way, this is a massive computational operation. That is why Neural Networks were mostly left on the shelf for over half a century. It was only very recently that we even had the power and architecture in our machines to even consider doing these operations, and the properly sized datasets to match.
对于简单的分类任务,神经网络的性能与其他简单的算法(如 K 近邻)相对接近。当我们有更大得多的数据和更复杂的问题时,神经网络的真正效用得以实现,这两方面都优于其他机器学习模型。
For simple classification tasks, the neural network is relatively close in performance to other simple algorithms like K Nearest Neighbors. The real utility of neural networks is realized when we have much larger data, and much more complex questions, both of which outperform other machine learning models.
Deep Neural Networks
深度神经网络(DNN)是在输入层和输出层之间具有多个隐藏层的 ANN。与浅层 ANN 类似,DNN 可以对复杂的非线性关系建模。
A deep neural network (DNN) is an ANN with multiple hidden layers between the input and output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships.
神经网络的主要目的是接收一组输入,对它们执行渐进的复杂计算,并提供输出以解决现实世界的问题,例如分类。我们限制自己使用前馈神经网络。
The main purpose of a neural network is to receive a set of inputs, perform progressively complex calculations on them, and give output to solve real world problems like classification. We restrict ourselves to feed forward neural networks.
我们在深度网络中有一个输入、一个输出和一个顺序数据流。
We have an input, an output, and a flow of sequential data in a deep network.
神经网络广泛用于监督学习和强化学习问题。这些网络基于连接在一起的一组层。
Neural networks are widely used in supervised learning and reinforcement learning problems. These networks are based on a set of layers connected to each other.
在深度学习中,隐藏层的数量(大多是非线性的)可以很大;比如大约 1000 层。
In deep learning, the number of hidden layers, mostly non-linear, can be large; say about 1000 layers.
DL 模型产生的结果比普通 ML 网络好得多。
DL models produce much better results than normal ML networks.
我们主要使用梯度下降法优化网络并最小化损失函数。
We mostly use the gradient descent method for optimizing the network and minimising the loss function.
我们可以使用 Imagenet ,这是数百万张数字图像的存储库,将数据集分类到猫和狗等类别中。除静态图像外,DL 网络越来越多地用于动态图像以及时间序列和文本分析。
We can use the Imagenet, a repository of millions of digital images to classify a dataset into categories like cats and dogs. DL nets are increasingly used for dynamic images apart from static ones and for time series and text analysis.
训练数据集是深度学习模型的重要组成部分。此外,反向传播是训练 DL 模型的主要算法。
Training the data sets forms an important part of Deep Learning models. In addition, Backpropagation is the main algorithm in training DL models.
DL 涉及训练具有复杂输入输出变换的大型神经网络。
DL deals with training large neural networks with complex input output transformations.
DL 的一个示例是将照片映射到照片中的人名,就像他们在社交网络上所做的那样,用短语描述照片是 DL 的另一个最新应用。
One example of DL is the mapping of a photo to the name of the person(s) in photo as they do on social networks and describing a picture with a phrase is another recent application of DL.
神经网络是具有输入(如 x1、x2、x3……)的函数,这些输入在两个(浅层网络)或多个中间操作(也称为层,深度网络)中转换为输出(如 z1、z2、z3 等)。
Neural networks are functions that have inputs like x1,x2,x3…that are transformed to outputs like z1,z2,z3 and so on in two (shallow networks) or several intermediate operations also called layers (deep networks).
权重和偏差逐层变化。“w”和“v”是神经网络层的权重或突触。
The weights and biases change from layer to layer. ‘w’ and ‘v’ are the weights or synapses of layers of the neural networks.
深度学习的最佳用例是监督学习问题。在这里,我们有一组具有所需输出集的大量数据输入。
The best use case of deep learning is the supervised learning problem.Here,we have large set of data inputs with a desired set of outputs.
在这里,我们应用反向传播算法以获得正确的输出预测。
Here we apply back propagation algorithm to get correct output prediction.
深度学习最基本的数据集是 MNIST,它是一个手写数字数据集。
The most basic data set of deep learning is the MNIST, a dataset of handwritten digits.
我们可以使用 Keras 训练卷积神经网络来对来自此数据集的手写数字图像进行分类。
We can train deep a Convolutional Neural Network with Keras to classify images of handwritten digits from this dataset.
神经网络分类器的激活或激活会产生一个分数。例如,为了将患者分类为患病和健康,我们考虑身高、体重、体温、血压等参数。
The firing or activation of a neural net classifier produces a score. For example,to classify patients as sick and healthy,we consider parameters such as height, weight and body temperature, blood pressure etc.
高分表示患者生病了,而低分表示患者健康。
A high score means patient is sick and a low score means he is healthy.
输出和隐层中的每个节点都有自己的分类器。输入层接受输入,并将输入分数传递至下一隐层,以进一步激活,一直传递至输出。
Each node in output and hidden layers has its own classifiers. The input layer takes inputs and passes on its scores to the next hidden layer for further activation and this goes on till the output is reached.
这种从左向右从输入到输出的进行称之为 forward propagation.
This progress from input to output from left to right in the forward direction is called forward propagation.
神经网络中的信用分配路径 (CAP) 是从输入到输出的一系列转换。CAP 会详细说明输入和输出之间可能的因果关系。
Credit assignment path (CAP) in a neural network is the series of transformations starting from the input to the output. CAPs elaborate probable causal connections between the input and the output.
对于给定的前馈神经网络的 CAP 深度,或者 CAP 深度就是隐藏层数加一,因为包括输出层。对于递归神经网络,其中一个信号可能会多次传播通过一个层,CAP 深度可能无穷大。
CAP depth for a given feed forward neural network or the CAP depth is the number of hidden layers plus one as the output layer is included. For recurrent neural networks, where a signal may propagate through a layer several times, the CAP depth can be potentially limitless.
Deep Nets and Shallow Nets
深度学习和浅层学习之间没有明确的深度阈值区分;但大多数人都同意,对于具有多个非线性层的深度学习,CAP 必须大于 2。
There is no clear threshold of depth that divides shallow learning from deep learning; but it is mostly agreed that for deep learning which has multiple non-linear layers, CAP must be greater than two.
神经网络中的基本节点是一个感知,模拟了生物神经网络中的神经元。然后我们得到多层感知器或 MLP。每组输入都会被一组权重和偏置修改;每条边都有一个唯一的权重,每个节点都有一个唯一的偏置。
Basic node in a neural net is a perception mimicking a neuron in a biological neural network. Then we have multi-layered Perception or MLP. Each set of inputs is modified by a set of weights and biases; each edge has a unique weight and each node has a unique bias.
神经网络的预测 accuracy 取决于其 weights and biases.
The prediction accuracy of a neural net depends on its weights and biases.
提高神经网络准确度的过程称为 training. 。从前馈传播网络输出与已知正确的该值进行比较。
The process of improving the accuracy of neural network is called training. The output from a forward prop net is compared to that value which is known to be correct.
cost function or the loss function 是生成输出与实际输出之间的差异。
The cost function or the loss function is the difference between the generated output and the actual output.
训练的目的是使训练成本在数百万个训练示例中尽可能小。要做到这一点,网络会调整权重和偏置,直到预测值与正确输出值匹配。
The point of training is to make the cost of training as small as possible across millions of training examples.To do this, the network tweaks the weights and biases until the prediction matches the correct output.
经过充分训练后,神经网络有潜力每次都能进行准确预测。
Once trained well, a neural net has the potential to make an accurate prediction every time.
当模式变得复杂,你想让你的计算机识别它们时,你必须使用神经网络。在这样复杂模式的场景中,神经网络优于所有其他竞争算法。
When the pattern gets complex and you want your computer to recognise them, you have to go for neural networks.In such complex pattern scenarios, neural network outperformsall other competing algorithms.
现在有 GPU 可以比以往任何时候都训练得更快。深度神经网络已经在彻底改变人工智能领域。
There are now GPUs that can train them faster than ever before. Deep neural networks are already revolutionizing the field of AI
计算机已被证明擅长执行重复计算和遵循详细指令,但在识别复杂模式方面表现不佳。
Computers have proved to be good at performing repetitive calculations and following detailed instructions but have been not so good at recognising complex patterns.
如果存在识别简单模式的问题,支持向量机 (svm) 或逻辑回归分类器可以很好地完成这项工作,但是随着模式的复杂性的增加,除了使用深度神经网络别无他法。
If there is the problem of recognition of simple patterns, a support vector machine (svm) or a logistic regression classifier can do the job well, but as the complexity of patternincreases, there is no way but to go for deep neural networks.
因此,对于像人脸一样的复杂模式,浅层神经网络会失败,别无选择,只能转向具有更多层的深度神经网络。深度网络能够分解复杂的模式,将其分解为更简单的模式来完成任务。例如,人脸;深度网络将使用边来检测嘴唇、鼻子、眼睛、耳朵等部分,然后将这些部分重新组合在一起形成一张人脸。
Therefore, for complex patterns like a human face, shallow neural networks fail and have no alternative but to go for deep neural networks with more layers. The deep nets are able to do their job by breaking down the complex patterns into simpler ones. For example, human face; adeep net would use edges to detect parts like lips, nose, eyes, ears and so on and then re-combine these together to form a human face
正确预测的准确性变得如此之高,以至于最近在谷歌模式识别挑战赛中,深度网络击败了人类。
The accuracy of correct prediction has become so accurate that recently at a Google Pattern Recognition Challenge, a deep net beat a human.
这种分层感知器的网络想法已经存在一段时间;在该领域,深度网络模仿人脑。但其中一个缺点是它们需要很长的时间来训练,这是一种硬件限制。
This idea of a web of layered perceptrons has been around for some time; in this area, deep nets mimic the human brain. But one downside to this is that they take long time to train, a hardware constraint
然而最近的高性能GPU已经能够在不到一周的时间内训练这样的深度网络;而快速 CPU 可能需要数周或数月才能完成同样的工作。
However recent high performance GPUs have been able to train such deep nets under a week; while fast cpus could have taken weeks or perhaps months to do the same.
Choosing a Deep Net
如何选择深度网络?我们必须决定我们是要构建分类器,还是要尝试在数据中查找模式以及我们要使用无监督学习。为了从一组未标记的数据中提取模式,我们使用限制 Boltzmann 机或自动编码器。
How to choose a deep net? We have to decide if we are building a classifier or if we are trying to find patterns in the data and if we are going to use unsupervised learning. To extract patterns from a set of unlabelled data, we use a Restricted Boltzman machine or an Auto encoder.
在选择深度网络时考虑以下几点 −
Consider the following points while choosing a deep net −
-
For text processing, sentiment analysis, parsing and name entity recognition, we use a recurrent net or recursive neural tensor network or RNTN;
-
For any language model that operates at character level, we use the recurrent net.
-
For image recognition, we use deep belief network DBN or convolutional network.
-
For object recognition, we use a RNTN or a convolutional network.
-
For speech recognition, we use recurrent net.
一般来说,带有整流线性单元或 RELU 的深度信念网络和多层感知器都是分类的不错选择。
In general, deep belief networks and multilayer perceptrons with rectified linear units or RELU are both good choices for classification.
针对时间序列分析,我们始终建议使用递归网络。
For time series analysis, it is always recommended to use recurrent net.
神经网络已经存在超过 50 年,但直到现在才渐渐为人熟知。原因在于其训练困难;当我们尝试使用称为反向传播的方法来对它们进行训练时,我们会遇到一个称为梯度消失或梯度爆炸的问题。当出现这个问题时,训练会花费更长时间,并且准确率会退居其次。在训练数据集时,我们不断计算成本函数,即预测输出和由一组标记训练数据形成的实际输出之间的差异。然后,通过调整权重和偏差值,最小化成本函数,直至获得最低值。训练过程使用梯度,即成本相对于权重或偏差值的改变而改变的速率。
Neural nets have been around for more than 50 years; but only now they have risen into prominence. The reason is that they are hard to train; when we try to train them with a method called back propagation, we run into a problem called vanishing or exploding gradients.When that happens, training takes a longer time and accuracy takes a back-seat. When training a data set, we are constantly calculating the cost function, which is the difference between predicted output and the actual output from a set of labelled training data.The cost function is then minimized by adjusting the weights and biases values until the lowest value is obtained. The training process uses a gradient, which is the rate at which the cost will change with respect to change in weight or bias values.
Restricted Boltzman Networks or Autoencoders - RBNs
2006 年,解决梯度消失问题取得了一项突破。杰弗里·辛顿(Geoff Hinton)设计了一种新策略,导致了 Restricted Boltzman Machine - RBM 的开发,这是一种浅层的两层网络。
In 2006, a breakthrough was achieved in tackling the issue of vanishing gradients. Geoff Hinton devised a novel strategy that led to the development of Restricted Boltzman Machine - RBM, a shallow two layer net.
第一层为 visible 层,第二层为 hidden 层。可见层中的每个节点都连接到隐藏层中的每个节点。该网络被称为受限,因为同一层内的任意两层都不允许共享连接。
The first layer is the visible layer and the second layer is the hidden layer. Each node in the visible layer is connected to every node in the hidden layer. The network is known as restricted as no two layers within the same layer are allowed to share a connection.
自动编码器是将输入数据编码为向量的网络。它们创建原始数据的隐藏或压缩表示。这些向量可用于降维;向量将原始数据压缩为更少的维度。自动编码器与解码器配对,这样就可以根据输入数据的隐藏表示来重建输入数据。
Autoencoders are networks that encode input data as vectors. They create a hidden, or compressed, representation of the raw data. The vectors are useful in dimensionality reduction; the vector compresses the raw data into smaller number of essential dimensions. Autoencoders are paired with decoders, which allows the reconstruction of input data based on its hidden representation.
RBM 是双向翻译器的数学等价物。前向传递获取输入并将它们翻译成对输入进行编码的一组数字。与此同时,后向传递获取这组数字并将它们翻译回重建的输入。训练有素的网络以很高的精度执行反向传播。
RBM is the mathematical equivalent of a two-way translator. A forward pass takes inputs and translates them into a set of numbers that encodes the inputs. A backward pass meanwhile takes this set of numbers and translates them back into reconstructed inputs. A well-trained net performs back prop with a high degree of accuracy.
在这两个步骤中,权重和偏差都起着至关重要的作用;它们帮助 RBM 解码输入之间的相互关系,并在确定哪些输入对检测模式至关重要方面做出决策。通过前向和后向传递,RBM 被训练为以不同的权重和偏差重新构建输入,直到输入及其重构建尽可能接近。RBM 的一个有趣方面是数据不需要加标签。这对于照片、视频、语音和传感器数据等实际世界数据集来说非常重要,所有这些数据往往都没有标签。RBM 不会通过人工对数据进行标记,而是自动对数据进行分类;通过正确调整权重和偏差,RBM 能够提取重要特征并重建输入。RBM 是特征提取神经网络家族的一部分,旨在识别数据中的固有模式。它们也被称为自动编码器,因为它们必须编码自己的结构。
In either steps, the weights and the biases have a critical role; they help the RBM in decoding the interrelationships between the inputs and in deciding which inputs are essential in detecting patterns. Through forward and backward passes, the RBM is trained to re-construct the input with different weights and biases until the input and there-construction are as close as possible. An interesting aspect of RBM is that data need not be labelled. This turns out to be very important for real world data sets like photos, videos, voices and sensor data, all of which tend to be unlabelled. Instead of manually labelling data by humans, RBM automatically sorts through data; by properly adjusting the weights and biases, an RBM is able to extract important features and reconstruct the input. RBM is a part of family of feature extractor neural nets, which are designed to recognize inherent patterns in data. These are also called auto-encoders because they have to encode their own structure.
Deep Belief Networks - DBNs
深度信念网络(DBN)是通过组合 RBM 和引入一种巧妙的训练方法而形成的。我们有一个新的模型,它最终解决了梯度消失的问题。杰弗里·辛顿发明了 RBM 和深度信念网络,作为反向传播的替代方案。
Deep belief networks (DBNs) are formed by combining RBMs and introducing a clever training method. We have a new model that finally solves the problem of vanishing gradient. Geoff Hinton invented the RBMs and also Deep Belief Nets as alternative to back propagation.
DBN 在结构上类似于 MLP(多层感知器),但其在训练时有很大的不同。正是训练使得 DBN 能够优于其浅层对应网络
A DBN is similar in structure to a MLP (Multi-layer perceptron), but very different when it comes to training. it is the training that enables DBNs to outperform their shallow counterparts
DBN 可以视作 RBM 的堆栈,其中一个 RBM 的隐藏层是其上方 RBM 的可见层。第一个 RBM 被训练为尽可能准确地重建其输入。
A DBN can be visualized as a stack of RBMs where the hidden layer of one RBM is the visible layer of the RBM above it. The first RBM is trained to reconstruct its input as accurately as possible.
第一个 RBM 的隐藏层被用作第二个 RBM 的可见层,而第二个 RBM 使用第一个 RBM 的输出进行训练。这一过程会重复执行,直至网络中的每一层都完成训练。
The hidden layer of the first RBM is taken as the visible layer of the second RBM and the second RBM is trained using the outputs from the first RBM. This process is iterated till every layer in the network is trained.
在一个 DBN 中,每个 RBM 学习整个输入。DBN 通过对整个输入进行微调,像照相机镜头缓慢对焦图片一样缓慢地改善模型,从而全局工作。一组 RBM 的性能超过单个 RBM,就像多层感知器 MLP 的性能超过单个感知器一样。
In a DBN, each RBM learns the entire input. A DBN works globally by fine-tuning the entire input in succession as the model slowly improves like a camera lens slowly focussing a picture. A stack of RBMs outperforms a single RBM as a multi-layer perceptron MLP outperforms a single perceptron.
在这个阶段,RBM 检测到了数据中的固有模式,但没有任何名称或标签。为了完成 DBN 的训练,我们必须为模式引入标签,并使用监督学习对网络进行微调。
At this stage, the RBMs have detected inherent patterns in the data but without any names or label. To finish training of the DBN, we have to introduce labels to the patterns and fine tune the net with supervised learning.
我们需要一组非常小的标记样本,以便将特征和模式与名称关联起来。这组小标记的数据用于训练。与原始数据集相比,这组标记的数据可以非常小。
We need a very small set of labelled samples so that the features and patterns can be associated with a name. This small-labelled set of data is used for training. This set of labelled data can be very small when compared to the original data set.
权重和偏差会稍作改变,从而导致网络对模式的理解发生微小变化,而且总体准确性通常会有一个小幅提升。
The weights and biases are altered slightly, resulting in a small change in the net’s perception of the patterns and often a small increase in the total accuracy.
还可以使用 GPU 在合理的时间内完成训练,与浅层网络相比可以获得非常准确的结果,并且我们也看到了对消失梯度问题的解决方案。
The training can also be completed in a reasonable amount of time by using GPUs giving very accurate results as compared to shallow nets and we see a solution to vanishing gradient problem too.
Generative Adversarial Networks - GANs
生成对抗网络是由两个网络构成的深度神经网络,它们相互竞争,因此得名“对抗”。
Generative adversarial networks are deep neural nets comprising two nets, pitted one against the other, thus the “adversarial” name.
GAN 是由蒙特利尔大学的研究人员在 2014 年发表的一篇论文中引入的。Facebook 的人工智能专家 Yann LeCun 提到了 GAN,他称对抗训练“是过去 10 年机器学习中最有趣的想法”。
GANs were introduced in a paper published by researchers at the University of Montreal in 2014. Facebook’s AI expert Yann LeCun, referring to GANs, called adversarial training “the most interesting idea in the last 10 years in ML.”
由于网络可以扫描并学习模仿任何数据分布,因此 GAN 的潜力巨大。可以教授 GAN 在任何领域(图像、音乐、语言、散文)创建与我们自己非常相似的平行世界。它们在某种程度上是机器人艺术家,并且其输出非常令人印象深刻。
GANs’ potential is huge, as the network-scan learn to mimic any distribution of data. GANs can be taught to create parallel worlds strikingly similar to our own in any domain: images, music, speech, prose. They are robot artists in a way, and their output is quite impressive.
在 GAN 中,称为生成器的其中一个神经网络生成新的数据实例,而称为鉴别器的另一个神经网络则评估它们以判断其真实性。
In a GAN, one neural network, known as the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity.
假设我们试图生成类似于 MNIST 数据集中发现的手写数字,该数据集来自现实世界。鉴别器的职责是对显示的来自真实 MNIST 数据集的实例进行识别并将它们识别为真实。
Let us say we are trying to generate hand-written numerals like those found in the MNIST dataset, which is taken from the real world. The work of the discriminator, when shown an instance from the true MNIST dataset, is to recognize them as authentic.
现在考虑 GAN 的以下步骤:
Now consider the following steps of the GAN −
-
The generator network takes input in the form of random numbers and returns an image.
-
This generated image is given as input to the discriminator network along with a stream of images taken from the actual dataset.
-
The discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.
-
So you have a double feedback loop − The discriminator is in a feedback loop with the ground truth of the images, which we know. The generator is in a feedback loop with the discriminator.
Recurrent Neural Networks - RNNs
RNN 是数据可以向任何方向流动的神经网络。这些网络用于语言建模或自然语言处理 (NLP) 等应用。
*RNN*Sare neural networks in which data can flow in any direction. These networks are used for applications such as language modelling or Natural Language Processing (NLP).
RNN 背后的基本概念是利用顺序信息。在正常神经网络中,假设所有输入和输出彼此独立。如果我们想预测句子中的下一个单词,我们必须知道它之前的单词。
The basic concept underlying RNNs is to utilize sequential information. In a normal neural network it is assumed that all inputs and outputs are independent of each other. If we want to predict the next word in a sentence we have to know which words came before it.
RNN 被称为循环网络,因为它们对序列的每个元素重复相同的任务,而输出基于之前的计算。因此,可以说 RNN 具有“记忆”,该记忆可以捕获之前计算的信息。理论上,RNN 可以使用非常长序列中的信息,但实际上它们只能回溯几步。
RNNs are called recurrent as they repeat the same task for every element of a sequence, with the output being based on the previous computations. RNNs thus can be said to have a “memory” that captures information about what has been previously calculated. In theory, RNNs can use information in very long sequences, but in reality, they can look back only a few steps.
长短期记忆网络 (LSTM) 是最常用的 RNN。
Long short-term memory networks (LSTMs) are most commonly used RNNs.
RNN 已与卷积神经网络结合用于模型中,为未标记图像生成描述。令人惊讶的是,这种方式的效果非常好。
Together with convolutional Neural Networks, RNNs have been used as part of a model to generate descriptions for unlabelled images. It is quite amazing how well this seems to work.
Convolutional Deep Neural Networks - CNNs
如果我们增加神经网络中的层数使其更深,就会增加网络的复杂性,并使我们能够对更复杂的功能进行建模。然而,权重和偏差的数量会呈指数增长。事实上,学习如此困难的问题对于普通神经网络来说可能是不可能的。这导致了一种解决方案,即卷积神经网络。
If we increase the number of layers in a neural network to make it deeper, it increases the complexity of the network and allows us to model functions that are more complicated. However, the number of weights and biases will exponentially increase. As a matter of fact, learning such difficult problems can become impossible for normal neural networks. This leads to a solution, the convolutional neural networks.
CNN 在计算机视觉中得到广泛使用;也已应用于用于自动语音识别的声学建模。
CNNs are extensively used in computer vision; have been applied also in acoustic modelling for automatic speech recognition.
卷积神经网络背后的理念是“移动滤波器”,该滤波器会通过图像。该移动滤波器(或卷积)应用于特定邻近节点(例如像素),其中应用的滤波器为节点值 − 的 0.5 倍
The idea behind convolutional neural networks is the idea of a “moving filter” which passes through the image. This moving filter, or convolution, applies to a certain neighbourhood of nodes which for example may be pixels, where the filter applied is 0.5 x the node value −
著名研究员 Yann LeCun 开创了卷积神经网络。Facebook 使用这些网络作为面部识别软件。CNN 已成为机器视觉项目的首选解决方案。卷积网络有许多层。在 Imagenet 挑战赛中,一台机器在 2015 年的目标识别方面能够击败人类。
Noted researcher Yann LeCun pioneered convolutional neural networks. Facebook as facial recognition software uses these nets. CNN have been the go to solution for machine vision projects. There are many layers to a convolutional network. In Imagenet challenge, a machine was able to beat a human at object recognition in 2015.
简而言之,卷积神经网络 (CNN) 是多层神经网络。这些层有时高达 17 层或更多,并且假设输入数据是图像。
In a nutshell, Convolutional Neural Networks (CNNs) are multi-layer neural networks. The layers are sometimes up to 17 or more and assume the input data to be images.
CNN 大大减少了需要调整的参数数量。因此,CNN 能有效地处理原始图像的高维度。
CNNs drastically reduce the number of parameters that need to be tuned. So, CNNs efficiently handle the high dimensionality of raw images.
Python Deep Learning - Fundamentals
在本章中,我们将深入了解 Python 深度学习的基础知识。
In this chapter, we will look into the fundamentals of Python Deep Learning.
Deep learning models/algorithms
现在,让我们了解不同的深度学习模型/算法。
Let us now learn about the different deep learning models/ algorithms.
深度学习中一些流行的模型如下 −
Some of the popular models within deep learning are as follows −
-
Convolutional neural networks
-
Recurrent neural networks
-
Deep belief networks
-
Generative adversarial networks
-
Auto-encoders and so on
输入和输出表示为向量或张量。例如,神经网络可能具有输入,其中图像中的各个像素 RGB 值表示为向量。
The inputs and outputs are represented as vectors or tensors. For example, a neural network may have the inputs where individual pixel RGB values in an image are represented as vectors.
位于输入层和输出层之间的神经元层称为隐藏层。当神经网络尝试解决问题时,这是大部分工作发生的地方。仔细观察隐藏层可以揭示网络已学会从数据中提取的特征。
The layers of neurons that lie between the input layer and the output layer are called hidden layers. This is where most of the work happens when the neural net tries to solve problems. Taking a closer look at the hidden layers can reveal a lot about the features the network has learned to extract from the data.
通过选择哪些神经元连接到下一层中的其他神经元,可以形成神经网络的不同架构。
Different architectures of neural networks are formed by choosing which neurons to connect to the other neurons in the next layer.
Pseudocode for calculating output
以下是计算 Forward-propagating Neural Network 输出的伪代码 −
Following is the pseudocode for calculating output of Forward-propagating Neural Network −
-
# node[] := array of topologically sorted nodes
-
# An edge from a to b means a is to the left of b
-
# If the Neural Network has R inputs and S outputs,
-
# then first R nodes are input nodes and last S nodes are output nodes.
-
# incoming[x] := nodes connected to node x
-
# weight[x] := weights of incoming edges to x
对于每个神经元 x,从左至右−
For each neuron x, from left to right −
-
if x ⇐ R: do nothing # its an input node
-
inputs[x] = [output[i] for i in incoming[x]]
-
weighted_sum = dot_product(weights[x], inputs[x])
-
output[x] = Activation_function(weighted_sum)
Training a Neural Network
我们现在将学习如何训练神经网络。我们还将学习反向传播算法和 Python 深度学习中的反向传递。
We will now learn how to train a neural network. We will also learn back propagation algorithm and backward pass in Python Deep Learning.
我们必须找到神经网络权重的最佳值以获取所需的输出。为了训练神经网络,我们使用迭代梯度下降法。我们最初使用权重的随机初始化开始。在随机初始化之后,我们对数据的部分子集做出预测,采用前向传播处理,计算相应的成本函数 C,并根据与 dC/dw 成比例的量更新每个权重 w,即成本函数 w.r.t. 权重的导数。比例常数称为学习速率。
We have to find the optimal values of the weights of a neural network to get the desired output. To train a neural network, we use the iterative gradient descent method. We start initially with random initialization of the weights. After random initialization, we make predictions on some subset of the data with forward-propagation process, compute the corresponding cost function C, and update each weight w by an amount proportional to dC/dw, i.e., the derivative of the cost functions w.r.t. the weight. The proportionality constant is known as the learning rate.
可以使用反向传播算法有效地计算梯度。反向传播或反向传播的关键观察是,由于微分的链式法则,神经网络中每个神经元的梯度可以使用它具有输入边的神经元的梯度来计算。因此,我们反向计算梯度,即先计算输出层的梯度,然后计算最顶层的隐藏层,然后计算之前的隐藏层,依此类推,最后在输入层结束。
The gradients can be calculated efficiently using the back-propagation algorithm. The key observation of backward propagation or backward prop is that because of the chain rule of differentiation, the gradient at each neuron in the neural network can be calculated using the gradient at the neurons, it has outgoing edges to. Hence, we calculate the gradients backwards, i.e., first calculate the gradients of the output layer, then the top-most hidden layer, followed by the preceding hidden layer, and so on, ending at the input layer.
反向传播算法主要使用计算图的想法实现,其中每个神经元在计算图中扩展为许多节点,并执行简单的数学运算,例如加法和乘法。计算图在边上没有任何权重;所有权重都分配给节点,因此权重变成了它们自己的节点。然后在计算图上运行反向传播算法。计算完成后,只需要权重节点的梯度进行更新。其余梯度可以丢弃。
The back-propagation algorithm is implemented mostly using the idea of a computational graph, where each neuron is expanded to many nodes in the computational graph and performs a simple mathematical operation like addition, multiplication. The computational graph does not have any weights on the edges; all weights are assigned to the nodes, so the weights become their own nodes. The backward propagation algorithm is then run on the computational graph. Once the calculation is complete, only the gradients of the weight nodes are required for update. The rest of the gradients can be discarded.
Gradient Descent Optimization Technique
一种常用的优化函数根据其造成的误差调整权重,称为“梯度下降”。
One commonly used optimization function that adjusts weights according to the error they caused is called the “gradient descent.”
梯度是斜率的另一种名称,而斜率在 x-y 图上表示两个变量如何相互关联:运行的上升,时间的变化距离的变化等。在这种情况下,斜率是网络误差和一个权值之间的比率;即,随着权重的变化,误差如何变化。
Gradient is another name for slope, and slope, on an x-y graph, represents how two variables are related to each other: the rise over the run, the change in distance over the change in time, etc. In this case, the slope is the ratio between the network’s error and a single weight; i.e., how does the error change as the weight is varied.
更确切地说,我们想要找到产生最小误差的权重。我们想要找到能够正确表示输入数据中包含的信号,并将其转换为正确分类的权重。
To put it more precisely, we want to find which weight produces the least error. We want to find the weight that correctly represents the signals contained in the input data, and translates them to a correct classification.
随着神经网络的学习,它会缓慢调整许多权重,以便它们可以正确地将信号映射到含义。网络误差与这些权重之间每个的比率是一个导数,dE/dw 计算了权重发生轻微变化导致误差发生轻微变化的程度。
As a neural network learns, it slowly adjusts many weights so that they can map signal to meaning correctly. The ratio between network Error and each of those weights is a derivative, dE/dw that calculates the extent to which a slight change in a weight causes a slight change in the error.
每个权重只是涉及许多变换的深度网络中的一个因素;权重的信号穿过激活并对几层求和,因此我们使用微积分的链式法则来处理网络激活和输出。这将我们引向有问题的权重及其与整体误差的关系。
Each weight is just one factor in a deep network that involves many transforms; the signal of the weight passes through activations and sums over several layers, so we use the chain rule of calculus to work back through the network activations and outputs.This leads us to the weight in question, and its relationship to overall error.
给定两个变量误差和权重,由第三个变量 activation 调解,权重通过它传递。我们可以通过首先计算激活变化如何影响误差变化,以及权重变化如何影响激活变化来计算权重变化如何影响误差变化。
Given two variables, error and weight, are mediated by a third variable, activation, through which the weight is passed. We can calculate how a change in weight affects a change in error by first calculating how a change in activation affects a change in Error, and how a change in weight affects a change in activation.
深度学习的基本思想也无非如此:根据它产生的误差调整模型的权重,直到你无法再减少误差为止。
The basic idea in deep learning is nothing more than that: adjusting a model’s weights in response to the error it produces, until you cannot reduce the error any more.
如果梯度值小,深度网络训练缓慢,如果值高,训练则很快。训练中的任何不准确都会导致输出不准确。从输出到输入训练网络的过程称为反向传播或反向传播。我们知道前向传播从输入开始,向前进行。反向传播进行相反/相反的计算,从右到左计算梯度。
The deep net trains slowly if the gradient value is small and fast if the value is high. Any inaccuracies in training leads to inaccurate outputs. The process of training the nets from the output back to the input is called back propagation or back prop. We know that forward propagation starts with the input and works forward. Back prop does the reverse/opposite calculating the gradient from right to left.
每次计算梯度时,我们都会使用到那一点为止的所有先前的梯度。
Each time we calculate a gradient, we use all the previous gradients up to that point.
让我们从输出层中的节点开始。这条边使用该节点的梯度。当我们返回隐藏层时,它变得更加复杂。0 到 1 之间的两个数字的乘积会给你一个较小的数字。梯度值不断变小,因此反向传播需要花费大量时间进行训练,并且准确性会受到影响。
Let us start at a node in the output layer. The edge uses the gradient at that node. As we go back into the hidden layers, it gets more complex. The product of two numbers between 0 and 1 gives youa smaller number. The gradient value keeps getting smaller and as a result back prop takes a lot of time to train and accuracy suffers.
Challenges in Deep Learning Algorithms
对于浅层神经网络和深层神经网络,都存在一定的挑战,例如过度拟合和计算时间。DNN 会受到过度拟合的影响,因为它们使用了额外的抽象层,使它们能够对训练数据中的稀有依赖关系进行建模。
There are certain challenges for both shallow neural networks and deep neural networks, like overfitting and computation time. DNNs are affected by overfitting because the use of added layers of abstraction which allow them to model rare dependencies in the training data.
在训练期间应用了许多方法以对抗过拟合,例如 dropout、早期停止、数据增强和迁移学习。Dropout 正则化在训练期间会随机略去隐藏层中的单元,有助于避免稀疏依赖性。DNN 考虑多种训练参数,如规模(即层数和每层单元数)、学习速率和初始权重。找到最优参数并不总实用,原因是耗费大量时间和计算资源。批处理等多种技巧可以加快计算速度。GPU 强大的处理能力极大助力于训练过程,因为 GPU 能最好地执行所需的矩阵和向量计算。
Regularization methods such as drop out, early stopping, data augmentation, transfer learning are applied during training to combat overfitting. Drop out regularization randomly omits units from the hidden layers during training which helps in avoiding rare dependencies. DNNs take into consideration several training parameters such as the size, i.e., the number of layers and the number of units per layer, the learning rate and initial weights. Finding optimal parameters is not always practical due to the high cost in time and computational resources. Several hacks such as batching can speed up computation. The large processing power of GPUs has significantly helped the training process, as the matrix and vector computations required are well-executed on the GPUs.
Dropout
Dropout 是神经网络中一种流行的正则化技术。深度神经网络特别容易过拟合。
Dropout is a popular regularization technique for neural networks. Deep neural networks are particularly prone to overfitting.
现在让我们看看 Dropout 是什么,以及它是如何工作的。
Let us now see what dropout is and how it works.
用深度学习先驱之一 Geoffrey Hinton 的话来说:“如果您有一个深度神经网络但它没有过拟合,您可能应该使用更大一点的网络,然后使用 Dropout”。
In the words of Geoffrey Hinton, one of the pioneers of Deep Learning, ‘If you have a deep neural net and it’s not overfitting, you should probably be using a bigger one and using dropout’.
Dropout 是一种技术,在梯度下降的每次迭代中,我们会丢弃一组随机选定的节点。这意味着我们随机忽略一些节点,就像它们不存在一样。
Dropout is a technique where during each iteration of gradient descent, we drop a set of randomly selected nodes. This means that we ignore some nodes randomly as if they do not exist.
每个神经元都有 q 的概率被保留,而被丢弃的概率为 1-q。q 值在神经网络中的每一层中可能不同。对于隐藏层使用 0.5 的值,对于输入层使用 0 的值可在广泛的任务中发挥良好作用。
Each neuron is kept with a probability of q and dropped randomly with probability 1-q. The value q may be different for each layer in the neural network. A value of 0.5 for the hidden layers, and 0 for input layer works well on a wide range of tasks.
在评估和预测期间,不使用 Dropout。每个神经元的输出乘以 q,以便输入下一层的预期值相同。
During evaluation and prediction, no dropout is used. The output of each neuron is multiplied by q so that the input to the next layer has the same expected value.
Dropout 背后的想法如下 - 在没有 Dropout 正则化的神经网络中,神经元会相互产生相互依存性,从而导致过拟合。
The idea behind Dropout is as follows − In a neural network without dropout regularization, neurons develop co-dependency amongst each other that leads to overfitting.
Early Stopping
我们使用称为梯度下降的迭代算法训练神经网络。
We train neural networks using an iterative algorithm called gradient descent.
早期停止背后的想法很直观;当误差开始增加时,我们停止训练。此处,误差是指在验证数据中测量的误差,验证数据是用于微调超参数的部分训练数据。在这种情况下,超参数是停止准则。
The idea behind early stopping is intuitive; we stop training when the error starts to increase. Here, by error, we mean the error measured on validation data, which is the part of training data used for tuning hyper-parameters. In this case, the hyper-parameter is the stop criteria.
Data Augmentation
这个过程是指我们增加数据的量子,或使用现有数据并对其应用一些变换来扩充数据。具体使用的变换取决于我们打算实现的任务。此外,帮助神经网络的变换取决于其架构。
The process where we increase the quantum of data we have or augment it by using existing data and applying some transformations on it. The exact transformations used depend on the task we intend to achieve. Moreover, the transformations that help the neural net depend on its architecture.
例如,在许多计算机视觉任务中(如对象分类),一种有效的数据增强技术是添加新的数据点,这些数据点是原始数据的裁剪或平移版本。
For instance, in many computer vision tasks such as object classification, an effective data augmentation technique is adding new data points that are cropped or translated versions of original data.
当计算机接受图像作为输入时,它会采用像素值数组。假设整个图像向左平移了 15 个像素。我们在不同方向上应用了许多不同的平移,从而生成一个扩充数据集,其大小是原始数据集的数倍。
When a computer accepts an image as an input, it takes in an array of pixel values. Let us say that the whole image is shifted left by 15 pixels. We apply many different shifts in different directions, resulting in an augmented dataset many times the size of the original dataset.
Transfer Learning
采用经过预训练的模型并使用我们自己的数据集对模型进行“微调”的过程称为迁移学习。有多种方法可以做到这一点。以下描述了几种方法:
The process of taking a pre-trained model and “fine-tuning” the model with our own dataset is called transfer learning. There are several ways to do this.A few ways are described below −
-
We train the pre-trained model on a large dataset. Then, we remove the last layer of the network and replace it with a new layer with random weights.
-
We then freeze the weights of all the other layers and train the network normally. Here freezing the layers is not changing the weights during gradient descent or optimization.
这一概念背后的原理是:经过预训练的模型将充当特征提取器,而只有最后一层将在当前任务中接受训练。
The concept behind this is that the pre-trained model will act as a feature extractor, and only the last layer will be trained on the current task.
Computational Graphs
Tensorflow、Torch、Theano 等深度学习框架通过使用计算图来实现反向传播。更重要的是,对计算图中的反向传播的理解融合了几种不同的算法及其变体,例如时间反向传播和具有共享权重的反向传播。一旦所有内容都转换为计算图,它们仍是相同的算法 - 只不过是在计算图上进行反向传播。
Backpropagation is implemented in deep learning frameworks like Tensorflow, Torch, Theano, etc., by using computational graphs. More significantly, understanding back propagation on computational graphs combines several different algorithms and its variations such as backprop through time and backprop with shared weights. Once everything is converted into a computational graph, they are still the same algorithm − just back propagation on computational graphs.
What is Computational Graph
计算图定义为一个有向图,其中节点对应于数学运算。计算图是表达和求值数学表达式的途径。
A computational graph is defined as a directed graph where the nodes correspond to mathematical operations. Computational graphs are a way of expressing and evaluating a mathematical expression.
例如,下面是一个简单的数学方程式−
For example, here is a simple mathematical equation −
p = x+y
我们可以绘制上述方程的计算图,如下所示。
We can draw a computational graph of the above equation as follows.
上述计算图有一个加法节点(带有“+”符号的节点),有两个输入变量 x 和 y 及一个输出 q。
The above computational graph has an addition node (node with "+" sign) with two input variables x and y and one output q.
让我们再举一个稍复杂的例子。我们有以下方程式。
Let us take another example, slightly more complex. We have the following equation.
g = \left (x+y \right ) \ast z
g = \left (x+y \right ) \ast z
上述方程由以下计算图表示。
The above equation is represented by the following computational graph.
Computational Graphs and Backpropagation
计算图和反向传播都是深度学习中训练神经网络的重要核心概念。
Computational graphs and backpropagation, both are important core concepts in deep learning for training neural networks.
Forward Pass
前向传播是对计算图所表示数学表达式的值进行求值。执行前向传播表示我们从左(输入)到右(输出)将值沿着前向方向从变量中传递。
Forward pass is the procedure for evaluating the value of the mathematical expression represented by computational graphs. Doing forward pass means we are passing the value from variables in forward direction from the left (input) to the right where the output is.
让我们考虑一个示例,给所有输入赋予一些值。假设赋予所有输入以下值。
Let us consider an example by giving some value to all of the inputs. Suppose, the following values are given to all of the inputs.
x=1, y=3, z=−3
通过给输入赋予这些值,我们可以执行前向传播,并获得每个节点上输出的下列值。
By giving these values to the inputs, we can perform forward pass and get the following values for the outputs on each node.
首先,我们使用 x = 1 和 y = 3 的值来获得 p = 4。
First, we use the value of x = 1 and y = 3, to get p = 4.
然后,我们使用 p = 4 和 z = -3 来获得 g = -12。我们从左至右前进。
Then we use p = 4 and z = -3 to get g = -12. We go from left to right, forwards.
Objectives of Backward Pass
在反向传播中,我们的目的是针对最终输出计算每个输入的梯度。这些梯度对于使用梯度下降法训练神经网络至关重要。
In the backward pass, our intention is to compute the gradients for each input with respect to the final output. These gradients are essential for training the neural network using gradient descent.
例如,我们期望以下梯度。
For example, we desire the following gradients.
Desired gradients
\frac{\partial x}{\partial f}, \frac{\partial y}{\partial f}, \frac{\partial z}{\partial f}
Backward pass (backpropagation)
我们通过找到最终输出相对于最终输出(它本身!)的导数来启动反向传播。因此,它将得同一性导数,且该值等于一。
We start the backward pass by finding the derivative of the final output with respect to the final output (itself!). Thus, it will result in the identity derivation and the value is equal to one.
\frac{\partial g}{\partial g} = 1
我们的计算图现在如下所示 -
Our computational graph now looks as shown below −
接下来,我们将执行“*”操作的向后传递。我们将计算 p 和 z 处的梯度。由于 g = p*z,我们知道 -
Next, we will do the backward pass through the "*" operation. We will calculate the gradients at p and z. Since g = p*z, we know that −
\frac{\partial g}{\partial z} = p
\frac{\partial g}{\partial p} = z
我们已经从正向传递知道了 z 和 p 的值。因此,我们得到 -
We already know the values of z and p from the forward pass. Hence, we get −
\frac{\partial g}{\partial z} = p = 4
和
and
\frac{\partial g}{\partial p} = z = -3
我们想要计算 x 和 y 处的梯度 -
We want to calculate the gradients at x and y −
\frac{\partial g}{\partial x}, \frac{\partial g}{\partial y}
但是,我们想要高效地执行此操作(虽然 x 和 g 在此图中只有 2 跳的距离,但设想它们彼此相距甚远)。为了高效计算这些值,我们将使用微分的链式法则。根据链式法则,我们有 -
However, we want to do this efficiently (although x and g are only two hops away in this graph, imagine them being really far from each other). To calculate these values efficiently, we will use the chain rule of differentiation. From chain rule, we have −
\frac{\partial g}{\partial x}=\frac{\partial g}{\partial p}\ast \frac{\partial p}{\partial x}
\frac{\partial g}{\partial y}=\frac{\partial g}{\partial p}\ast \frac{\partial p}{\partial y}
但我们已经知道 dg/dp=-3,dp/dx 和 dp/dy 很容易,因为 p 直接取决于 x 和 y。我们有 -
But we already know the dg/dp = -3, dp/dx and dp/dy are easy since p directly depends on x and y. We have −
p=x+y\Rightarrow \frac{\partial x}{\partial p} = 1, \frac{\partial y}{\partial p} = 1
因此,我们得到 -
Hence, we get −
\frac{\partial g} {\partial f} = \frac{\partial g} {\partial p}\ast \frac{\partial p} {\partial x} = \left ( -3 \right ).1 = -3
此外,对于输入 y -
In addition, for the input y −
\frac{\partial g} {\partial y} = \frac{\partial g} {\partial p}\ast \frac{\partial p} {\partial y} = \left ( -3 \right ).1 = -3
这样反向进行的主要原因是,当我们必须计算 x 处梯度时,我们仅使用已计算的值以及 dq/dx(节点输出相对于同一节点输入的导数)。我们使用局部信息来计算一个全局值。
The main reason for doing this backwards is that when we had to calculate the gradient at x, we only used already computed values, and dq/dx (derivative of node output with respect to the same node’s input). We used local information to compute a global value.
Steps for training a neural network
按照以下步骤训练神经网络:
Follow these steps to train a neural network −
-
For data point x in dataset,we do forward pass with x as input, and calculate the cost c as output.
-
We do backward pass starting at c, and calculate gradients for all nodes in the graph. This includes nodes that represent the neural network weights.
-
We then update the weights by doing W = W - learning rate * gradients.
-
We repeat this process until stop criteria is met.
Python Deep Learning - Applications
深度学习已经为一些应用产生了良好的结果,例如计算机视觉、语言翻译、图像字幕、音频转录、分子生物学、语音识别、自然语言处理、自动驾驶汽车、脑肿瘤检测、实时语音翻译、音乐创作、自动游戏游戏等等。
Deep learning has produced good results for a few applications such as computer vision, language translation, image captioning, audio transcription, molecular biology, speech recognition, natural language processing, self-driving cars, brain tumour detection, real-time speech translation, music composition, automatic game playing and so on.
深度学习是机器学习之后的下一次重大飞跃,采用了更先进的实现方式。目前,它正朝着成为行业标准的方向发展,有望在处理原始非结构化数据时成为游戏规则改变者。
Deep learning is the next big leap after machine learning with a more advanced implementation. Currently, it is heading towards becoming an industry standard bringing a strong promise of being a game changer when dealing with raw unstructured data.
深度学习目前是各种现实世界问题最佳的解决方案提供者之一。开发人员正在构建人工智能程序,这些程序不再使用先前给定的规则,而是从示例中学习解决复杂的任务。随着许多数据科学家使用深度学习,更深层次的神经网络正在提供越来越准确的结果。
Deep learning is currently one of the best solution providers fora wide range of real-world problems. Developers are building AI programs that, instead of using previously given rules, learn from examples to solve complicated tasks. With deep learning being used by many data scientists, deeper neural networks are delivering results that are ever more accurate.
其理念是通过增加每个网络的训练层数来开发深度神经网络;机器更多地了解数据,直至尽可能准确。开发人员可以使用深度学习技术来实现复杂机器学习任务,并训练人工智能网络以获得高水平的感知识别。
The idea is to develop deep neural networks by increasing the number of training layers for each network; machine learns more about the data until it is as accurate as possible. Developers can use deep learning techniques to implement complex machine learning tasks, and train AI networks to have high levels of perceptual recognition.
深度学习在计算机视觉中很受欢迎。此处实现的任务之一是图像分类,其中给定的输入图像被分类为猫、狗等,或作为最能描述图像的类别或标签。作为人类,我们在生命早期就学会了如何执行此任务,并且具备快速识别模式、从先验知识中概括和适应不同图像环境的技能。
Deep learning finds its popularity in Computer vision. Here one of the tasks achieved is image classification where given input images are classified as cat, dog, etc. or as a class or label that best describe the image. We as humans learn how to do this task very early in our lives and have these skills of quickly recognizing patterns, generalizing from prior knowledge, and adapting to different image environments.
Libraries and Frameworks
在本章中,我们将把深度学习与不同的库和框架联系起来。
In this chapter, we will relate deep learning to the different libraries and frameworks.
Deep learning and Theano
如果我们想要开始编码深度神经网络,最好对 Theano、TensorFlow、Keras、PyTorch 等不同框架的工作原理有所了解。
If we want to start coding a deep neural network, it is better we have an idea how different frameworks like Theano, TensorFlow, Keras, PyTorch etc work.
Theano 是 python 库,它提供了一组函数,用于构建深度网络,以便在我们的机器上快速训练。
Theano is python library which provides a set of functions for building deep nets that train quickly on our machine.
Theano 是在加拿大蒙特利尔大学在深度网络先驱约书亚·本吉奥的领导下开发的。
Theano was developed at the University of Montreal, Canada under the leadership of Yoshua Bengio a deep net pioneer.
Theano 让我们能够定义和评估带向量和矩阵的数学表达式,它们是数字的矩形数组。
Theano lets us define and evaluate mathematical expressions with vectors and matrices which are rectangular arrays of numbers.
从技术上讲,神经网络和输入数据都可以表示为矩阵,所有标准网络操作都可以重新定义为矩阵操作。这很重要,因为计算机可以非常快速地执行矩阵操作。
Technically speaking, both neural nets and input data can be represented as matrices and all standard net operations can be redefined as matrix operations. This is important since computers can carry out matrix operations very quickly.
我们可以在并行处理多个矩阵值,如果我们构建具有这种底层结构的神经网络,我们可以使用带 GPU 的单台机器在合理的时间窗口内训练巨大的网络。
We can process multiple matrix values in parallel and if we build a neural net with this underlying structure, we can use a single machine with a GPU to train enormous nets in a reasonable time window.
但是,如果我们使用 Theano,我们必须从头开始构建深度网络。该库不提供用于创建特定类型深度网络的完整功能。
However if we use Theano, we have to build the deep net from ground up. The library does not provide complete functionality for creating a specific type of deep net.
相反,我们需要对深度网络的每方面进行编码,如模型、层、激活、训练方法和阻止过拟合的任何特殊方法。
Instead, we have to code every aspect of the deep net like the model, the layers, the activation, the training method and any special methods to stop overfitting.
但是,好消息是 Theano 允许在矢量化函数之上构建我们的实现,为我们提供高度优化的解决方案。
The good news however is that Theano allows the building our implementation over a top of vectorized functions providing us with a highly optimized solution.
还有许多其他可扩展 Theano 功能的库。TensorFlow 和 Keras 作为后端与 Theano 配合使用。
There are many other libraries that extend the functionality of Theano. TensorFlow and Keras can be used with Theano as backend.
Deep Learning with TensorFlow
谷歌 TensorFlow 是一个 Python 库。该库是构建商业级深度学习应用程序的不错选择。
Googles TensorFlow is a python library. This library is a great choice for building commercial grade deep learning applications.
TensorFlow 源于谷歌大脑项目的一部分 DistBelief V2 的另一个库。该库的目标是扩展机器学习的可移植性,以便将研究模型应用于商业级应用程序。
TensorFlow grew out of another library DistBelief V2 that was a part of Google Brain Project. This library aims to extend the portability of machine learning so that research models could be applied to commercial-grade applications.
类似于 Theano 库,TensorFlow 基于计算图,节点表示持久数据或数学运算,边表示节点之间的数据流,这是多维数组或张量;因此得名 TensorFlow。
Much like the Theano library, TensorFlow is based on computational graphs where a node represents persistent data or math operation and edges represent the flow of data between nodes, which is a multidimensional array or tensor; hence the name TensorFlow
运算或一组运算的输出被作为输入送入下一个运算。
The output from an operation or a set of operations is fed as input into the next.
尽管 TensorFlow 是为神经网络设计的,但它也适用于其他可以将计算建模为数据流图的网络。
Even though TensorFlow was designed for neural networks, it works well for other nets where computation can be modelled as data flow graph.
TensorFlow 还使用了 Theano 的几个特性,如通用和子表达式消除、自动微分、共享和符号变量。
TensorFlow also uses several features from Theano such as common and sub-expression elimination, auto differentiation, shared and symbolic variables.
可以使用 TensorFlow 构建不同类型的深度网络,如卷积网络、自动编码器、RNTN、RNN、RBM、DBM/MLP 等。
Different types of deep nets can be built using TensorFlow like convolutional nets, Autoencoders, RNTN, RNN, RBM, DBM/MLP and so on.
但是,TensorFlow 中不支持超参数配置。对于该功能,我们可以使用 Keras。
However, there is no support for hyper parameter configuration in TensorFlow.For this functionality, we can use Keras.
Deep Learning and Keras
Keras 是一个易于使用的强大 Python 库,用于开发和评估深度学习模型。
Keras is a powerful easy-to-use Python library for developing and evaluating deep learning models.
它采用简约的设计,允许我们逐层构建网络;训练和运行网络。
It has a minimalist design that allows us to build a net layer by layer; train it, and run it.
它封装了高效的数值计算库 Theano 和 TensorFlow,并允许我们在几行代码中定义和训练神经网络模型。
It wraps the efficient numerical computation libraries Theano and TensorFlow and allows us to define and train neural network models in a few short lines of code.
它是一个高级神经网络 API,有助于广泛使用深度学习和人工智能。它在许多较低级库(包括 TensorFlow、Theano 等)之上运行。Keras 代码是可移植的;我们可以使用 Theano 或 TensorFlow 作为后端在 Keras 中实现神经网络,而不会更改任何代码。
It is a high-level neural network API, helping to make wide use of deep learning and artificial intelligence. It runs on top of a number of lower-level libraries including TensorFlow, Theano,and so on. Keras code is portable; we can implement a neural network in Keras using Theano or TensorFlow as a back ended without any changes in code.
Python Deep Learning - Implementations
在这个深度学习的实现中,我们的目标是为一家特定的银行预测客户流失或流失数据,即可能离开这家银行服务的客户。使用的数据集相对较小,包含 10000 行,14 列。我们使用 Anaconda 发行版和 Theano、TensorFlow 和 Keras 等框架。Keras 构建在以 Tensorflow 和 Theano 为后端的代码之上。
In this implementation of Deep learning, our objective is to predict the customer attrition or churning data for a certain bank - which customers are likely to leave this bank service. The Dataset used is relatively small and contains 10000 rows with 14 columns. We are using Anaconda distribution, and frameworks like Theano, TensorFlow and Keras. Keras is built on top of Tensorflow and Theano which function as its backends.
# Artificial Neural Network
# Installing Theano
pip install --upgrade theano
# Installing Tensorflow
pip install –upgrade tensorflow
# Installing Keras
pip install --upgrade keras
Step 1: Data preprocessing
In[]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the database
dataset = pd.read_csv('Churn_Modelling.csv')
Step 2
我们创建了数据集的特征矩阵和目标变量,即第 14 列,标记为“Exited”。
We create matrices of the features of dataset and the target variable, which is column 14, labeled as “Exited”.
数据的初始外观如下所示 −
The initial look of data is as shown below −
In[]:
X = dataset.iloc[:, 3:13].values
Y = dataset.iloc[:, 13].values
X
Step 4
通过对字符串变量进行编码简化分析。我们使用 ScikitLearn 函数“LabelEncoder”自动对列中的不同标签进行编码,值介于 0 到 n_classes-1 之间。
We make the analysis simpler by encoding string variables. We are using the ScikitLearn function ‘LabelEncoder’ to automatically encode the different labels in the columns with values between 0 to n_classes-1.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:,1] = labelencoder_X_1.fit_transform(X[:,1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
X
Output
在上面的结果中,国家名称已替换为 0、1 和 2;而男性和女性已替换为 0 和 1。
In the above output,country names are replaced by 0, 1 and 2; while male and female are replaced by 0 and 1.
Step 5
Labelling Encoded Data
Labelling Encoded Data
我们使用相同的 ScikitLearn 库和另一个名为 OneHotEncoder 的函数,只是为了传递列号,以创建一个哑变量。
We use the same ScikitLearn library and another function called the OneHotEncoder to just pass the column number creating a dummy variable.
onehotencoder = OneHotEncoder(categorical features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
X
现在,前 2 列表示国家,第 4 列表示性别。
Now, the first 2 columns represent the country and the 4th column represents the gender.
Output
我们总是将我们的数据分为训练和测试部分;我们在训练数据上训练我们的模型,然后我们检查模型在测试数据上的准确性,这有助于评估模型的效率。
We always divide our data into training and testing part; we train our model on training data and then we check the accuracy of a model on testing data which helps in evaluating the efficiency of model.
Step 6
我们使用 ScikitLearn 的 train_test_split 函数将我们的数据拆分为训练集和测试集。我们将训练与测试的拆分比例保持为 80:20。
We are using ScikitLearn’s train_test_split function to split our data into training set and test set. We keep the train- to- test split ratio as 80:20.
#Splitting the dataset into the Training set and the Test Set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
一些变量的值为数千,而另一些变量的值为十或一。我们对数据进行缩放,以便它们更具代表性。
Some variables have values in thousands while some have values in tens or ones. We scale the data so that they are more representative.
Step 7
在这个代码中,我们使用 StandardScaler 函数对训练数据进行拟合和转换。我们对缩放进行标准化,以便使用相同拟合的方法来转换/缩放测试数据。
In this code, we are fitting and transforming the training data using the StandardScaler function. We standardize our scaling so that we use the same fitted method to transform/scale test data.
# Feature Scaling
fromsklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Output
现在数据已正确缩放。最后,我们完成了数据预处理。现在,我们将从我们的模型开始。
The data is now scaled properly. Finally, we are done with our data pre-processing. Now,we will start with our model.
Step 8
我们在这里导入所需的模块。我们需要顺序模块来初始化神经网络,需要密集模块来添加隐藏层。
We import the required Modules here. We need the Sequential module for initializing the neural network and the dense module to add the hidden layers.
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
Step 9
我们将模型命名为 Classifier,因为我们的目标是对客户流失进行分类。然后我们使用 Sequential 模块进行初始化。
We will name the model as Classifier as our aim is to classify customer churn. Then we use the Sequential module for initialization.
#Initializing Neural Network
classifier = Sequential()
Step 10
我们使用 dense 函数逐个添加隐藏层。在下面的代码中,我们将看到许多参数。
We add the hidden layers one by one using the dense function. In the code below, we will see many arguments.
我们的第一个参数是 output_dim 。它是我们添加到此层的节点数。 init 是随机梯度下降的初始化。在神经网络中,我们为每个节点分配权重。在初始化时,权重应接近于零,并且我们使用均匀函数随机初始化权重。只为第一层需要 input_dim 参数,因为模型不知道我们输入变量的数量。在此,输入变量的总数为 11。在第二层,模型自动从第一隐藏层了解输入变量的数量。
Our first parameter is output_dim. It is the number of nodes we add to this layer. init is the initialization of the Stochastic Gradient Decent. In a Neural Network we assign weights to each node. At initialization, weights should be near to zero and we randomly initialize weights using the uniform function. The input_dim parameter is needed only for first layer, as the model does not know the number of our input variables. Here the total number of input variables is 11. In the second layer, the model automatically knows the number of input variables from the first hidden layer.
执行以下代码行以添加输入层和第一隐藏层 −
Execute the following line of code to addthe input layer and the first hidden layer −
classifier.add(Dense(units = 6, kernel_initializer = 'uniform',
activation = 'relu', input_dim = 11))
执行以下代码行以添加第二隐藏层 −
Execute the following line of code to add the second hidden layer −
classifier.add(Dense(units = 6, kernel_initializer = 'uniform',
activation = 'relu'))
执行以下代码行以添加输出层 −
Execute the following line of code to add the output layer −
classifier.add(Dense(units = 1, kernel_initializer = 'uniform',
activation = 'sigmoid'))
Step 11
Compiling the ANN
Compiling the ANN
我们现在已向分类器添加了多层。我们现在将使用 compile 方法编译它们。在最终编译中添加的参数控制完成神经网络。所以,我们在这一步中需要小心。
We have added multiple layers to our classifier until now. We will now compile them using the compile method. Arguments added in final compilation control complete the neural network.So,we need to be careful in this step.
以下是参数的简要说明。
Here is a brief explanation of the arguments.
第一个参数是 Optimizer 。它是一种用于找到最佳权重设置的算法。这个算法称为 Stochastic Gradient Descent (SGD) 。我们会从多种类型中使用一种,即“Adam 优化器”。SGD 取决于损失,所以我们的第二个参数是损失。如果我们的因变量是二元的,我们使用称为 ‘binary_crossentropy’ 的对数损失函数。如果我们的因变量在输出中有多于两类,我们使用 ‘categorical_crossentropy’ 。我们希望基于 accuracy 改善神经网络的性能,所以我们添加 metrics 作为一个准确度。
First argument is Optimizer.This is an algorithm used to find the optimal set of weights. This algorithm is called the Stochastic Gradient Descent (SGD). Here we are using one among several types, called the ‘Adam optimizer’. The SGD depends on loss, so our second parameter is loss. If our dependent variable is binary, we use logarithmic loss function called ‘binary_crossentropy’, and if our dependent variable has more than two categories in output, then we use ‘categorical_crossentropy’. We want to improve performance of our neural network based on accuracy, so we add metrics as accuracy.
# Compiling Neural Network
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
Fitting the ANN to the Training Set
现在,我们对训练数据训练我们的模型。我们使用 fit 方法来拟合我们的模型。我们还优化权重以提高模型效率。为此,我们必须更新权重。 Batch size 是我们在更新权重后的观察次数。 Epoch 是迭代总数。我们会通过实验法选择批次大小和轮次的值。
We now train our model on the training data. We use the fit method to fit our model. We also optimize the weights to improve model efficiency. For this, we have to update the weights. Batch size is the number of observations after which we update the weights. Epoch is the total number of iterations. The values of batch size and epoch are chosen by the trial and error method.
classifier.fit(X_train, y_train, batch_size = 10, epochs = 50)
Making predictions and evaluating the model
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
Predicting a single new observation
# Predicting a single new observation
"""Our goal is to predict if the customer with the following data will leave the bank:
Geography: Spain
Credit Score: 500
Gender: Female
Age: 40
Tenure: 3
Balance: 50000
Number of Products: 2
Has Credit Card: Yes
Is Active Member: Yes
Step 13
Predicting the test set result
Predicting the test set result
预测结果会告诉你客户离开公司的可能性。我们会将该可能性转换为二进制 0 和 1。
The prediction result will give you probability of the customer leaving the company. We will convert that probability into binary 0 and 1.
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
new_prediction = classifier.predict(sc.transform
(np.array([[0.0, 0, 500, 1, 40, 3, 50000, 2, 1, 1, 40000]])))
new_prediction = (new_prediction > 0.5)
Step 14
这是最后一步,我们将评估模型性能。我们已经有了原始结果,因此我们可以构建混淆矩阵来检查我们模型的准确度。
This is the last step where we evaluate our model performance. We already have original results and thus we can build confusion matrix to check the accuracy of our model.
Making the Confusion Matrix
Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print (cm)
Output
loss: 0.3384 acc: 0.8605
[ [1541 54]
[230 175] ]
从混淆矩阵来看,我们模型的准确率可以计算为 −
From the confusion matrix, the Accuracy of our model can be calculated as −
Accuracy = 1541+175/2000=0.858
We achieved 85.8% accuracy ,这是好的。
We achieved 85.8% accuracy, which is good.
The Forward Propagation Algorithm
在本节中,我们将会学习如何编写代码来对简单的神经网络进行正向传播(预测) −
In this section, we will learn how to write code to do forward propagation (prediction) for a simple neural network −
每个数据点都是客户。第一个输入是他们有多少个帐户,第二个输入是他们有多少个孩子。该模型会预测用户明年会进行多少交易。
Each data point is a customer. The first input is how many accounts they have, and the second input is how many children they have. The model will predict how many transactions the user makes in the next year.
输入数据已作为输入数据预加载,权重都在一个称为 weights 的字典中。隐藏层中第一个节点的权重数组在 weights['node_0'] 中,隐藏层中第二个节点的权重数组则在 weights['node_1'] 中。
The input data is pre-loaded as input data, and the weights are in a dictionary called weights. The array of weights for the first node in the hidden layer are in weights [‘node_0’], and for the second node in the hidden layer are in weights[‘node_1’] respectively.
输入输出节点的权重在 weights 中可用。
The weights feeding into the output node are available in weights.
The Rectified Linear Activation Function
“激活函数”是在每个节点中工作的函数。它将节点的输入转换为某些输出。
An "activation function" is a function that works at each node. It transforms the node’s input into some output.
修正线性激活函数 (简称 ReLU) 广泛用于超高性能网络。该函数采用单个数字作为输入,如果输入为负数,则返回 0;如果输入为正数,则将其作为输出返回。
The rectified linear activation function (called ReLU) is widely used in very high-performance networks. This function takes a single number as an input, returning 0 if the input is negative, and input as the output if the input is positive.
这里列出一些示例 −
Here are some examples −
-
relu(4) = 4
-
relu(-2) = 0
我们填写 relu() 函数的定义−
We fill in the definition of the relu() function−
-
We use the max() function to calculate the value for the output of relu().
-
We apply the relu() function to node_0_input to calculate node_0_output.
-
We apply the relu() function to node_1_input to calculate node_1_output.
import numpy as np
input_data = np.array([-1, 2])
weights = {
'node_0': np.array([3, 3]),
'node_1': np.array([1, 5]),
'output': np.array([2, -1])
}
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = np.tanh(node_0_input)
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = np.tanh(node_1_input)
hidden_layer_output = np.array(node_0_output, node_1_output)
output =(hidden_layer_output * weights['output']).sum()
print(output)
def relu(input):
'''Define your relu activation function here'''
# Calculate the value for the output of the relu function: output
output = max(input,0)
# Return the value just calculated
return(output)
# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)
# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)
# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])
# Calculate model output (do not apply relu)
odel_output = (hidden_layer_outputs * weights['output']).sum()
print(model_output)# Print model output
Applying the network to many Observations/rows of data
在本节中,我们将学习如何定义一个名为 predict_with_network() 的函数。此函数将生成对多个数据观测的预测,这些观测取自以上网络,作为 input_data。使用以上网络中给出的权重。relu() 函数定义也正在使用中。
In this section, we will learn how to define a function called predict_with_network(). This function will generate predictions for multiple data observations, taken from network above taken as input_data. The weights given in above network are being used. The relu() function definition is also being used.
让我们定义一个名为 predict_with_network() 的函数,该函数接受两个参数 - input_data_row 和 weights - 并返回来自网络的预测作为输出。
Let us define a function called predict_with_network() that accepts two arguments - input_data_row and weights - and returns a prediction from the network as the output.
我们计算每个节点的输入和输出值,将它们存储为:node_0_input、node_0_output、node_1_input 和 node_1_output。
We calculate the input and output values for each node, storing them as: node_0_input, node_0_output, node_1_input, and node_1_output.
要计算节点的输入值,我们将相关数组相乘并计算它们的和。
To calculate the input value of a node, we multiply the relevant arrays together and compute their sum.
要计算节点的输出值,我们将 relu() 函数应用于节点的输入值。我们使用“for 循环”来迭代 input_data -
To calculate the output value of a node, we apply the relu()function to the input value of the node. We use a ‘for loop’ to iterate over input_data −
我们还使用 predict_with_network() 为 input_data 中的每一行(input_data_row)生成预测。我们还将每个预测附加到 results 中。
We also use our predict_with_network() to generate predictions for each row of the input_data - input_data_row. We also append each prediction to results.
# Define predict_with_network()
def predict_with_network(input_data_row, weights):
# Calculate node 0 value
node_0_input = (input_data_row * weights['node_0']).sum()
node_0_output = relu(node_0_input)
# Calculate node 1 value
node_1_input = (input_data_row * weights['node_1']).sum()
node_1_output = relu(node_1_input)
# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])
# Calculate model output
input_to_final_layer = (hidden_layer_outputs*weights['output']).sum()
model_output = relu(input_to_final_layer)
# Return model output
return(model_output)
# Create empty list to store prediction results
results = []
for input_data_row in input_data:
# Append prediction to results
results.append(predict_with_network(input_data_row, weights))
print(results)# Print results
Deep multi-layer neural networks
这里我们正在编写代码来对具有两个隐藏层的神经网络进行前向传播。每个隐藏层有两个节点。输入数据已预加载为 input_data 。第一个隐藏层中的节点称为 node_0_0 和 node_0_1。
Here we are writing code to do forward propagation for a neural network with two hidden layers. Each hidden layer has two nodes. The input data has been preloaded as input_data. The nodes in the first hidden layer are called node_0_0 and node_0_1.
它们的权重分别预加载为 weights['node_0_0'] 和 weights['node_0_1']。
Their weights are pre-loaded as weights['node_0_0'] and weights['node_0_1'] respectively.
第二个隐藏层中的节点称为 node_1_0 and node_1_1 。它们的权重分别预加载为 weights['node_1_0'] 和 weights['node_1_1'] 。
The nodes in the second hidden layer are called node_1_0 and node_1_1. Their weights are pre-loaded as weights['node_1_0'] and weights['node_1_1'] respectively.
然后我们使用预加载为 weights['output'] 的权重从隐藏节点创建模型输出。
We then create a model output from the hidden nodes using weights pre-loaded as weights['output'].
我们使用其权重 weights['node_0_0'] 和给定的 input_data 计算 node_0_0_input。然后应用 relu() 函数以获得 node_0_0_output。
We calculate node_0_0_input using its weights weights['node_0_0'] and the given input_data. Then apply the relu() function to get node_0_0_output.
我们对 node_0_1_input 做与上面相同的事,以获得 node_0_1_output。
We do the same as above for node_0_1_input to get node_0_1_output.
我们使用其权重 weights['node_1_0'] 和来自第一个隐藏层输出 hidden_0_outputs 计算 node_1_0_input。然后我们应用 relu() 函数以获得 node_1_0_output。
We calculate node_1_0_input using its weights weights['node_1_0'] and the outputs from the first hidden layer - hidden_0_outputs. We then apply the relu() function to get node_1_0_output.
我们对 node_1_1_input 做与上面相同的事,以获得 node_1_1_output。
We do the same as above for node_1_1_input to get node_1_1_output.
我们使用 weights['output'] 和来自第二个隐藏层 hidden_1_outputs 数组的输出计算 model_output。我们不对此输出应用 relu() 函数。
We calculate model_output using weights['output'] and the outputs from the second hidden layer hidden_1_outputs array. We do not apply the relu()function to this output.
import numpy as np
input_data = np.array([3, 5])
weights = {
'node_0_0': np.array([2, 4]),
'node_0_1': np.array([4, -5]),
'node_1_0': np.array([-1, 1]),
'node_1_1': np.array([2, 2]),
'output': np.array([2, 7])
}
def predict_with_network(input_data):
# Calculate node 0 in the first hidden layer
node_0_0_input = (input_data * weights['node_0_0']).sum()
node_0_0_output = relu(node_0_0_input)
# Calculate node 1 in the first hidden layer
node_0_1_input = (input_data*weights['node_0_1']).sum()
node_0_1_output = relu(node_0_1_input)
# Put node values into array: hidden_0_outputs
hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
# Calculate node 0 in the second hidden layer
node_1_0_input = (hidden_0_outputs*weights['node_1_0']).sum()
node_1_0_output = relu(node_1_0_input)
# Calculate node 1 in the second hidden layer
node_1_1_input = (hidden_0_outputs*weights['node_1_1']).sum()
node_1_1_output = relu(node_1_1_input)
# Put node values into array: hidden_1_outputs
hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])
# Calculate model output: model_output
model_output = (hidden_1_outputs*weights['output']).sum()
# Return model_output
return(model_output)
output = predict_with_network(input_data)
print(output)