Python Deep Learning 简明教程

Deep Neural Networks

深度神经网络(DNN)是在输入层和输出层之间具有多个隐藏层的 ANN。与浅层 ANN 类似,DNN 可以对复杂的非线性关系建模。

A deep neural network (DNN) is an ANN with multiple hidden layers between the input and output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships.

神经网络的主要目的是接收一组输入,对它们执行渐进的复杂计算,并提供输出以解决现实世界的问题,例如分类。我们限制自己使用前馈神经网络。

The main purpose of a neural network is to receive a set of inputs, perform progressively complex calculations on them, and give output to solve real world problems like classification. We restrict ourselves to feed forward neural networks.

我们在深度网络中有一个输入、一个输出和一个顺序数据流。

We have an input, an output, and a flow of sequential data in a deep network.

deep network

神经网络广泛用于监督学习和强化学习问题。这些网络基于连接在一起的一组层。

Neural networks are widely used in supervised learning and reinforcement learning problems. These networks are based on a set of layers connected to each other.

在深度学习中,隐藏层的数量(大多是非线性的)可以很大;比如大约 1000 层。

In deep learning, the number of hidden layers, mostly non-linear, can be large; say about 1000 layers.

DL 模型产生的结果比普通 ML 网络好得多。

DL models produce much better results than normal ML networks.

我们主要使用梯度下降法优化网络并最小化损失函数。

We mostly use the gradient descent method for optimizing the network and minimising the loss function.

我们可以使用 Imagenet ,这是数百万张数字图像的存储库,将数据集分类到猫和狗等类别中。除静态图像外,DL 网络越来越多地用于动态图像以及时间序列和文本分析。

We can use the Imagenet, a repository of millions of digital images to classify a dataset into categories like cats and dogs. DL nets are increasingly used for dynamic images apart from static ones and for time series and text analysis.

训练数据集是深度学习模型的重要组成部分。此外,反向传播是训练 DL 模型的主要算法。

Training the data sets forms an important part of Deep Learning models. In addition, Backpropagation is the main algorithm in training DL models.

DL 涉及训练具有复杂输入输出变换的大型神经网络。

DL deals with training large neural networks with complex input output transformations.

DL 的一个示例是将照片映射到照片中的人名,就像他们在社交网络上所做的那样,用短语描述照片是 DL 的另一个最新应用。

One example of DL is the mapping of a photo to the name of the person(s) in photo as they do on social networks and describing a picture with a phrase is another recent application of DL.

dl mapping

神经网络是具有输入(如 x1、x2、x3……)的函数,这些输入在两个(浅层网络)或多个中间操作(也称为层,深度网络)中转换为输出(如 z1、z2、z3 等)。

Neural networks are functions that have inputs like x1,x2,x3…that are transformed to outputs like z1,z2,z3 and so on in two (shallow networks) or several intermediate operations also called layers (deep networks).

权重和偏差逐层变化。“w”和“v”是神经网络层的权重或突触。

The weights and biases change from layer to layer. ‘w’ and ‘v’ are the weights or synapses of layers of the neural networks.

深度学习的最佳用例是监督学习问题。在这里,我们有一组具有所需输出集的大量数据输入。

The best use case of deep learning is the supervised learning problem.Here,we have large set of data inputs with a desired set of outputs.

backpropagation algorithm

在这里,我们应用反向传播算法以获得正确的输出预测。

Here we apply back propagation algorithm to get correct output prediction.

深度学习最基本的数据集是 MNIST,它是一个手写数字数据集。

The most basic data set of deep learning is the MNIST, a dataset of handwritten digits.

我们可以使用 Keras 训练卷积神经网络来对来自此数据集的手写数字图像进行分类。

We can train deep a Convolutional Neural Network with Keras to classify images of handwritten digits from this dataset.

神经网络分类器的激活或激活会产生一个分数。例如,为了将患者分类为患病和健康,我们考虑身高、体重、体温、血压等参数。

The firing or activation of a neural net classifier produces a score. For example,to classify patients as sick and healthy,we consider parameters such as height, weight and body temperature, blood pressure etc.

高分表示患者生病了,而低分表示患者健康。

A high score means patient is sick and a low score means he is healthy.

输出和隐层中的每个节点都有自己的分类器。输入层接受输入,并将输入分数传递至下一隐层,以进一步激活,一直传递至输出。

Each node in output and hidden layers has its own classifiers. The input layer takes inputs and passes on its scores to the next hidden layer for further activation and this goes on till the output is reached.

这种从左向右从输入到输出的进行称之为 forward propagation.

This progress from input to output from left to right in the forward direction is called forward propagation.

神经网络中的信用分配路径 (CAP) 是从输入到输出的一系列转换。CAP 会详细说明输入和输出之间可能的因果关系。

Credit assignment path (CAP) in a neural network is the series of transformations starting from the input to the output. CAPs elaborate probable causal connections between the input and the output.

对于给定的前馈神经网络的 CAP 深度,或者 CAP 深度就是隐藏层数加一,因为包括输出层。对于递归神经网络,其中一个信号可能会多次传播通过一个层,CAP 深度可能无穷大。

CAP depth for a given feed forward neural network or the CAP depth is the number of hidden layers plus one as the output layer is included. For recurrent neural networks, where a signal may propagate through a layer several times, the CAP depth can be potentially limitless.

Deep Nets and Shallow Nets

深度学习和浅层学习之间没有明确的深度阈值区分;但大多数人都同意,对于具有多个非线性层的深度学习,CAP 必须大于 2。

There is no clear threshold of depth that divides shallow learning from deep learning; but it is mostly agreed that for deep learning which has multiple non-linear layers, CAP must be greater than two.

神经网络中的基本节点是一个感知,模拟了生物神经网络中的神经元。然后我们得到多层感知器或 MLP。每组输入都会被一组权重和偏置修改;每条边都有一个唯一的权重,每个节点都有一个唯一的偏置。

Basic node in a neural net is a perception mimicking a neuron in a biological neural network. Then we have multi-layered Perception or MLP. Each set of inputs is modified by a set of weights and biases; each edge has a unique weight and each node has a unique bias.

神经网络的预测 accuracy 取决于其 weights and biases.

The prediction accuracy of a neural net depends on its weights and biases.

提高神经网络准确度的过程称为 training. 。从前馈传播网络输出与已知正确的该值进行比较。

The process of improving the accuracy of neural network is called training. The output from a forward prop net is compared to that value which is known to be correct.

cost function or the loss function 是生成输出与实际输出之间的差异。

The cost function or the loss function is the difference between the generated output and the actual output.

训练的目的是使训练成本在数百万个训练示例中尽可能小。要做到这一点,网络会调整权重和偏置,直到预测值与正确输出值匹配。

The point of training is to make the cost of training as small as possible across millions of training examples.To do this, the network tweaks the weights and biases until the prediction matches the correct output.

经过充分训练后,神经网络有潜力每次都能进行准确预测。

Once trained well, a neural net has the potential to make an accurate prediction every time.

当模式变得复杂,你想让你的计算机识别它们时,你必须使用神经网络。在这样复杂模式的场景中,神经网络优于所有其他竞争算法。

When the pattern gets complex and you want your computer to recognise them, you have to go for neural networks.In such complex pattern scenarios, neural network outperformsall other competing algorithms.

现在有 GPU 可以比以往任何时候都训练得更快。深度神经网络已经在彻底改变人工智能领域。

There are now GPUs that can train them faster than ever before. Deep neural networks are already revolutionizing the field of AI

计算机已被证明擅长执行重复计算和遵循详细指令,但在识别复杂模式方面表现不佳。

Computers have proved to be good at performing repetitive calculations and following detailed instructions but have been not so good at recognising complex patterns.

如果存在识别简单模式的问题,支持向量机 (svm) 或逻辑回归分类器可以很好地完成这项工作,但是随着模式的复杂性的增加,除了使用深度神经网络别无他法。

If there is the problem of recognition of simple patterns, a support vector machine (svm) or a logistic regression classifier can do the job well, but as the complexity of patternincreases, there is no way but to go for deep neural networks.

因此,对于像人脸一样的复杂模式,浅层神经网络会失败,别无选择,只能转向具有更多层的深度神经网络。深度网络能够分解复杂的模式,将其分解为更简单的模式来完成任务。例如,人脸;深度网络将使用边来检测嘴唇、鼻子、眼睛、耳朵等部分,然后将这些部分重新组合在一起形成一张人脸。

Therefore, for complex patterns like a human face, shallow neural networks fail and have no alternative but to go for deep neural networks with more layers. The deep nets are able to do their job by breaking down the complex patterns into simpler ones. For example, human face; adeep net would use edges to detect parts like lips, nose, eyes, ears and so on and then re-combine these together to form a human face

正确预测的准确性变得如此之高,以至于最近在谷歌模式识别挑战赛中,深度网络击败了人类。

The accuracy of correct prediction has become so accurate that recently at a Google Pattern Recognition Challenge, a deep net beat a human.

这种分层感知器的网络想法已经存在一段时间;在该领域,深度网络模仿人脑。但其中一个缺点是它们需要很长的时间来训练,这是一种硬件限制。

This idea of a web of layered perceptrons has been around for some time; in this area, deep nets mimic the human brain. But one downside to this is that they take long time to train, a hardware constraint

然而最近的高性能GPU已经能够在不到一周的时间内训练这样的深度网络;而快速 CPU 可能需要数周或数月才能完成同样的工作。

However recent high performance GPUs have been able to train such deep nets under a week; while fast cpus could have taken weeks or perhaps months to do the same.

Choosing a Deep Net

如何选择深度网络?我们必须决定我们是要构建分类器,还是要尝试在数据中查找模式以及我们要使用无监督学习。为了从一组未标记的数据中提取模式,我们使用限制 Boltzmann 机或自动编码器。

How to choose a deep net? We have to decide if we are building a classifier or if we are trying to find patterns in the data and if we are going to use unsupervised learning. To extract patterns from a set of unlabelled data, we use a Restricted Boltzman machine or an Auto encoder.

在选择深度网络时考虑以下几点 −

Consider the following points while choosing a deep net −

  1. For text processing, sentiment analysis, parsing and name entity recognition, we use a recurrent net or recursive neural tensor network or RNTN;

  2. For any language model that operates at character level, we use the recurrent net.

  3. For image recognition, we use deep belief network DBN or convolutional network.

  4. For object recognition, we use a RNTN or a convolutional network.

  5. For speech recognition, we use recurrent net.

一般来说,带有整流线性单元或 RELU 的深度信念网络和多层感知器都是分类的不错选择。

In general, deep belief networks and multilayer perceptrons with rectified linear units or RELU are both good choices for classification.

针对时间序列分析,我们始终建议使用递归网络。

For time series analysis, it is always recommended to use recurrent net.

神经网络已经存在超过 50 年,但直到现在才渐渐为人熟知。原因在于其训练困难;当我们尝试使用称为反向传播的方法来对它们进行训练时,我们会遇到一个称为梯度消失或梯度爆炸的问题。当出现这个问题时,训练会花费更长时间,并且准确率会退居其次。在训练数据集时,我们不断计算成本函数,即预测输出和由一组标记训练数据形成的实际输出之间的差异。然后,通过调整权重和偏差值,最小化成本函数,直至获得最低值。训练过程使用梯度,即成本相对于权重或偏差值的改变而改变的速率。

Neural nets have been around for more than 50 years; but only now they have risen into prominence. The reason is that they are hard to train; when we try to train them with a method called back propagation, we run into a problem called vanishing or exploding gradients.When that happens, training takes a longer time and accuracy takes a back-seat. When training a data set, we are constantly calculating the cost function, which is the difference between predicted output and the actual output from a set of labelled training data.The cost function is then minimized by adjusting the weights and biases values until the lowest value is obtained. The training process uses a gradient, which is the rate at which the cost will change with respect to change in weight or bias values.

Restricted Boltzman Networks or Autoencoders - RBNs

2006 年,解决梯度消失问题取得了一项突破。杰弗里·辛顿(Geoff Hinton)设计了一种新策略,导致了 Restricted Boltzman Machine - RBM 的开发,这是一种浅层的两层网络。

In 2006, a breakthrough was achieved in tackling the issue of vanishing gradients. Geoff Hinton devised a novel strategy that led to the development of Restricted Boltzman Machine - RBM, a shallow two layer net.

第一层为 visible 层,第二层为 hidden 层。可见层中的每个节点都连接到隐藏层中的每个节点。该网络被称为受限,因为同一层内的任意两层都不允许共享连接。

The first layer is the visible layer and the second layer is the hidden layer. Each node in the visible layer is connected to every node in the hidden layer. The network is known as restricted as no two layers within the same layer are allowed to share a connection.

自动编码器是将输入数据编码为向量的网络。它们创建原始数据的隐藏或压缩表示。这些向量可用于降维;向量将原始数据压缩为更少的维度。自动编码器与解码器配对,这样就可以根据输入数据的隐藏表示来重建输入数据。

Autoencoders are networks that encode input data as vectors. They create a hidden, or compressed, representation of the raw data. The vectors are useful in dimensionality reduction; the vector compresses the raw data into smaller number of essential dimensions. Autoencoders are paired with decoders, which allows the reconstruction of input data based on its hidden representation.

RBM 是双向翻译器的数学等价物。前向传递获取输入并将它们翻译成对输入进行编码的一组数字。与此同时,后向传递获取这组数字并将它们翻译回重建的输入。训练有素的网络以很高的精度执行反向传播。

RBM is the mathematical equivalent of a two-way translator. A forward pass takes inputs and translates them into a set of numbers that encodes the inputs. A backward pass meanwhile takes this set of numbers and translates them back into reconstructed inputs. A well-trained net performs back prop with a high degree of accuracy.

在这两个步骤中,权重和偏差都起着至关重要的作用;它们帮助 RBM 解码输入之间的相互关系,并在确定哪些输入对检测模式至关重要方面做出决策。通过前向和后向传递,RBM 被训练为以不同的权重和偏差重新构建输入,直到输入及其重构建尽可能接近。RBM 的一个有趣方面是数据不需要加标签。这对于照片、视频、语音和传感器数据等实际世界数据集来说非常重要,所有这些数据往往都没有标签。RBM 不会通过人工对数据进行标记,而是自动对数据进行分类;通过正确调整权重和偏差,RBM 能够提取重要特征并重建输入。RBM 是特征提取神经网络家族的一部分,旨在识别数据中的固有模式。它们也被称为自动编码器,因为它们必须编码自己的结构。

In either steps, the weights and the biases have a critical role; they help the RBM in decoding the interrelationships between the inputs and in deciding which inputs are essential in detecting patterns. Through forward and backward passes, the RBM is trained to re-construct the input with different weights and biases until the input and there-construction are as close as possible. An interesting aspect of RBM is that data need not be labelled. This turns out to be very important for real world data sets like photos, videos, voices and sensor data, all of which tend to be unlabelled. Instead of manually labelling data by humans, RBM automatically sorts through data; by properly adjusting the weights and biases, an RBM is able to extract important features and reconstruct the input. RBM is a part of family of feature extractor neural nets, which are designed to recognize inherent patterns in data. These are also called auto-encoders because they have to encode their own structure.

rbm structure

Deep Belief Networks - DBNs

深度信念网络(DBN)是通过组合 RBM 和引入一种巧妙的训练方法而形成的。我们有一个新的模型,它最终解决了梯度消失的问题。杰弗里·辛顿发明了 RBM 和深度信念网络,作为反向传播的替代方案。

Deep belief networks (DBNs) are formed by combining RBMs and introducing a clever training method. We have a new model that finally solves the problem of vanishing gradient. Geoff Hinton invented the RBMs and also Deep Belief Nets as alternative to back propagation.

DBN 在结构上类似于 MLP(多层感知器),但其在训练时有很大的不同。正是训练使得 DBN 能够优于其浅层对应网络

A DBN is similar in structure to a MLP (Multi-layer perceptron), but very different when it comes to training. it is the training that enables DBNs to outperform their shallow counterparts

DBN 可以视作 RBM 的堆栈,其中一个 RBM 的隐藏层是其上方 RBM 的可见层。第一个 RBM 被训练为尽可能准确地重建其输入。

A DBN can be visualized as a stack of RBMs where the hidden layer of one RBM is the visible layer of the RBM above it. The first RBM is trained to reconstruct its input as accurately as possible.

第一个 RBM 的隐藏层被用作第二个 RBM 的可见层,而第二个 RBM 使用第一个 RBM 的输出进行训练。这一过程会重复执行,直至网络中的每一层都完成训练。

The hidden layer of the first RBM is taken as the visible layer of the second RBM and the second RBM is trained using the outputs from the first RBM. This process is iterated till every layer in the network is trained.

在一个 DBN 中,每个 RBM 学习整个输入。DBN 通过对整个输入进行微调,像照相机镜头缓慢对焦图片一样缓慢地改善模型,从而全局工作。一组 RBM 的性能超过单个 RBM,就像多层感知器 MLP 的性能超过单个感知器一样。

In a DBN, each RBM learns the entire input. A DBN works globally by fine-tuning the entire input in succession as the model slowly improves like a camera lens slowly focussing a picture. A stack of RBMs outperforms a single RBM as a multi-layer perceptron MLP outperforms a single perceptron.

在这个阶段,RBM 检测到了数据中的固有模式,但没有任何名称或标签。为了完成 DBN 的训练,我们必须为模式引入标签,并使用监督学习对网络进行微调。

At this stage, the RBMs have detected inherent patterns in the data but without any names or label. To finish training of the DBN, we have to introduce labels to the patterns and fine tune the net with supervised learning.

我们需要一组非常小的标记样本,以便将特征和模式与名称关联起来。这组小标记的数据用于训练。与原始数据集相比,这组标记的数据可以非常小。

We need a very small set of labelled samples so that the features and patterns can be associated with a name. This small-labelled set of data is used for training. This set of labelled data can be very small when compared to the original data set.

权重和偏差会稍作改变,从而导致网络对模式的理解发生微小变化,而且总体准确性通常会有一个小幅提升。

The weights and biases are altered slightly, resulting in a small change in the net’s perception of the patterns and often a small increase in the total accuracy.

还可以使用 GPU 在合理的时间内完成训练,与浅层网络相比可以获得非常准确的结果,并且我们也看到了对消失梯度问题的解决方案。

The training can also be completed in a reasonable amount of time by using GPUs giving very accurate results as compared to shallow nets and we see a solution to vanishing gradient problem too.

Generative Adversarial Networks - GANs

生成对抗网络是由两个网络构成的深度神经网络,它们相互竞争,因此得名“对抗”。

Generative adversarial networks are deep neural nets comprising two nets, pitted one against the other, thus the “adversarial” name.

GAN 是由蒙特利尔大学的研究人员在 2014 年发表的一篇论文中引入的。Facebook 的人工智能专家 Yann LeCun 提到了 GAN,他称对抗训练“是过去 10 年机器学习中最有趣的想法”。

GANs were introduced in a paper published by researchers at the University of Montreal in 2014. Facebook’s AI expert Yann LeCun, referring to GANs, called adversarial training “the most interesting idea in the last 10 years in ML.”

由于网络可以扫描并学习模仿任何数据分布,因此 GAN 的潜力巨大。可以教授 GAN 在任何领域(图像、音乐、语言、散文)创建与我们自己非常相似的平行世界。它们在某种程度上是机器人艺术家,并且其输出非常令人印象深刻。

GANs’ potential is huge, as the network-scan learn to mimic any distribution of data. GANs can be taught to create parallel worlds strikingly similar to our own in any domain: images, music, speech, prose. They are robot artists in a way, and their output is quite impressive.

在 GAN 中,称为生成器的其中一个神经网络生成新的数据实例,而称为鉴别器的另一个神经网络则评估它们以判断其真实性。

In a GAN, one neural network, known as the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity.

假设我们试图生成类似于 MNIST 数据集中发现的手写数字,该数据集来自现实世界。鉴别器的职责是对显示的来自真实 MNIST 数据集的实例进行识别并将它们识别为真实。

Let us say we are trying to generate hand-written numerals like those found in the MNIST dataset, which is taken from the real world. The work of the discriminator, when shown an instance from the true MNIST dataset, is to recognize them as authentic.

现在考虑 GAN 的以下步骤:

Now consider the following steps of the GAN −

  1. The generator network takes input in the form of random numbers and returns an image.

  2. This generated image is given as input to the discriminator network along with a stream of images taken from the actual dataset.

  3. The discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.

  4. So you have a double feedback loop − The discriminator is in a feedback loop with the ground truth of the images, which we know. The generator is in a feedback loop with the discriminator.

Recurrent Neural Networks - RNNs

RNN 是数据可以向任何方向流动的神经网络。这些网络用于语言建模或自然语言处理 (NLP) 等应用。

*RNN*Sare neural networks in which data can flow in any direction. These networks are used for applications such as language modelling or Natural Language Processing (NLP).

RNN 背后的基本概念是利用顺序信息。在正常神经网络中,假设所有输入和输出彼此独立。如果我们想预测句子中的下一个单词,我们必须知道它之前的单词。

The basic concept underlying RNNs is to utilize sequential information. In a normal neural network it is assumed that all inputs and outputs are independent of each other. If we want to predict the next word in a sentence we have to know which words came before it.

RNN 被称为循环网络,因为它们对序列的每个元素重复相同的任务,而输出基于之前的计算。因此,可以说 RNN 具有“记忆”,该记忆可以捕获之前计算的信息。理论上,RNN 可以使用非常长序列中的信息,但实际上它们只能回溯几步。

RNNs are called recurrent as they repeat the same task for every element of a sequence, with the output being based on the previous computations. RNNs thus can be said to have a “memory” that captures information about what has been previously calculated. In theory, RNNs can use information in very long sequences, but in reality, they can look back only a few steps.

recurrent neural networks

长短期记忆网络 (LSTM) 是最常用的 RNN。

Long short-term memory networks (LSTMs) are most commonly used RNNs.

RNN 已与卷积神经网络结合用于模型中,为未标记图像生成描述。令人惊讶的是,这种方式的效果非常好。

Together with convolutional Neural Networks, RNNs have been used as part of a model to generate descriptions for unlabelled images. It is quite amazing how well this seems to work.

Convolutional Deep Neural Networks - CNNs

如果我们增加神经网络中的层数使其更深,就会增加网络的复杂性,并使我们能够对更复杂的功能进行建模。然而,权重和偏差的数量会呈指数增长。事实上,学习如此困难的问题对于普通神经网络来说可能是不可能的。这导致了一种解决方案,即卷积神经网络。

If we increase the number of layers in a neural network to make it deeper, it increases the complexity of the network and allows us to model functions that are more complicated. However, the number of weights and biases will exponentially increase. As a matter of fact, learning such difficult problems can become impossible for normal neural networks. This leads to a solution, the convolutional neural networks.

CNN 在计算机视觉中得到广泛使用;也已应用于用于自动语音识别的声学建模。

CNNs are extensively used in computer vision; have been applied also in acoustic modelling for automatic speech recognition.

卷积神经网络背后的理念是“移动滤波器”,该滤波器会通过图像。该移动滤波器(或卷积)应用于特定邻近节点(例如像素),其中应用的滤波器为节点值 − 的 0.5 倍

The idea behind convolutional neural networks is the idea of a “moving filter” which passes through the image. This moving filter, or convolution, applies to a certain neighbourhood of nodes which for example may be pixels, where the filter applied is 0.5 x the node value −

著名研究员 Yann LeCun 开创了卷积神经网络。Facebook 使用这些网络作为面部识别软件。CNN 已成为机器视觉项目的首选解决方案。卷积网络有许多层。在 Imagenet 挑战赛中,一台机器在 2015 年的目标识别方面能够击败人类。

Noted researcher Yann LeCun pioneered convolutional neural networks. Facebook as facial recognition software uses these nets. CNN have been the go to solution for machine vision projects. There are many layers to a convolutional network. In Imagenet challenge, a machine was able to beat a human at object recognition in 2015.

简而言之,卷积神经网络 (CNN) 是多层神经网络。这些层有时高达 17 层或更多,并且假设输入数据是图像。

In a nutshell, Convolutional Neural Networks (CNNs) are multi-layer neural networks. The layers are sometimes up to 17 or more and assume the input data to be images.

convolutional neural networks

CNN 大大减少了需要调整的参数数量。因此,CNN 能有效地处理原始图像的高维度。

CNNs drastically reduce the number of parameters that need to be tuned. So, CNNs efficiently handle the high dimensionality of raw images.