Artificial Neural Network 简明教程

Artificial Neural Network - Quick Guide

Artificial Neural Network - Basic Concepts

神经网络是并行计算设备,它基本上是尝试建立大脑的计算机模型。主要目标是开发一个系统,以比传统系统更快地执行各种计算任务。这些任务包括模式识别和分类、逼近、优化和数据聚类。

Neural networks are parallel computing devices, which is basically an attempt to make a computer model of the brain. The main objective is to develop a system to perform various computational tasks faster than the traditional systems. These tasks include pattern recognition and classification, approximation, optimization, and data clustering.

What is Artificial Neural Network?

人造神经网络 (ANN) 是一个有效的计算系统,其中心思想借鉴了生物神经网络的类比。ANN 也被称为“人造神经系统”或“平行分布式处理系统”或“连接主义系统”。ANN 拥有大量单元的集合,这些单元以某种模式相互连接,以允许单元之间进行通信。这些单元,也称为节点或神经元,是并行工作的简单处理器。

Artificial Neural Network (ANN) is an efficient computing system whose central theme is borrowed from the analogy of biological neural networks. ANNs are also named as “artificial neural systems,” or “parallel distributed processing systems,” or “connectionist systems.” ANN acquires a large collection of units that are interconnected in some pattern to allow communication between the units. These units, also referred to as nodes or neurons, are simple processors which operate in parallel.

每个神经元都通过连接链路与其他神经元连接。每个连接链路都与一个权重相关,该权重包含输入信号的信息。这是神经元解决特定问题最有用的信息,因为权重通常会激发或抑制正在传达的信号。每个神经元都有一个内部状态,称为激活信号。在组合输入信号和激活规则后产生的输出信号可以发送到其他单元。

Every neuron is connected with other neuron through a connection link. Each connection link is associated with a weight that has information about the input signal. This is the most useful information for neurons to solve a particular problem because the weight usually excites or inhibits the signal that is being communicated. Each neuron has an internal state, which is called an activation signal. Output signals, which are produced after combining the input signals and activation rule, may be sent to other units.

A Brief History of ANN

ANN 的历史可以分为以下三个时代 −

The history of ANN can be divided into the following three eras −

ANN during 1940s to 1960s

这个时代的一些关键发展如下 −

Some key developments of this era are as follows −

  1. 1943 − It has been assumed that the concept of neural network started with the work of physiologist, Warren McCulloch, and mathematician, Walter Pitts, when in 1943 they modeled a simple neural network using electrical circuits in order to describe how neurons in the brain might work.

  2. 1949 − Donald Hebb’s book, The Organization of Behavior, put forth the fact that repeated activation of one neuron by another increases its strength each time they are used.

  3. 1956 − An associative memory network was introduced by Taylor.

  4. 1958 − A learning method for McCulloch and Pitts neuron model named Perceptron was invented by Rosenblatt.

  5. 1960 − Bernard Widrow and Marcian Hoff developed models called "ADALINE" and “MADALINE.”

ANN during 1960s to 1980s

这个时代的一些关键发展如下 −

Some key developments of this era are as follows −

  1. 1961 − Rosenblatt made an unsuccessful attempt but proposed the “backpropagation” scheme for multilayer networks.

  2. 1964 − Taylor constructed a winner-take-all circuit with inhibitions among output units.

  3. 1969 − Multilayer perceptron (MLP) was invented by Minsky and Papert.

  4. 1971 − Kohonen developed Associative memories.

  5. 1976 − Stephen Grossberg and Gail Carpenter developed Adaptive resonance theory.

ANN from 1980s till Present

这个时代的一些关键发展如下 −

Some key developments of this era are as follows −

  1. 1982 − The major development was Hopfield’s Energy approach.

  2. 1985 − Boltzmann machine was developed by Ackley, Hinton, and Sejnowski.

  3. 1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta Rule.

  4. 1988 − Kosko developed Binary Associative Memory (BAM) and also gave the concept of Fuzzy Logic in ANN.

历史回顾表明,该领域已取得了重大进展。基于神经网络的芯片正在出现,并且正在开发对复杂问题的应用。当然,今天是神经网络技术转型期。

The historical review shows that significant progress has been made in this field. Neural network based chips are emerging and applications to complex problems are being developed. Surely, today is a period of transition for neural network technology.

Biological Neuron

神经细胞(神经元)是一种特殊的生物细胞,可以处理信息。据估计,有大量的,大约1011个神经元以及大量的相互连接,大约1015个。

A nerve cell (neuron) is a special biological cell that processes information. According to an estimation, there are huge number of neurons, approximately 1011 with numerous interconnections, approximately 1015.

Schematic Diagram

schematic diagram

Working of a Biological Neuron

如图所示,一个典型的神经元由以下四个部分组成,借助这四个部分我们可以解释它的工作原理 −

As shown in the above diagram, a typical neuron consists of the following four parts with the help of which we can explain its working −

  1. Dendrites − They are tree-like branches, responsible for receiving the information from other neurons it is connected to. In other sense, we can say that they are like the ears of neuron.

  2. Soma − It is the cell body of the neuron and is responsible for processing of information, they have received from dendrites.

  3. Axon − It is just like a cable through which neurons send the information.

  4. Synapses − It is the connection between the axon and other neuron dendrites.

ANN versus BNN

在查看人工神经网络 (ANN) 和生物神经网络 (BNN) 之间的差异之前,让我们来看看这两个网络在术语方面的相似之处。

Before taking a look at the differences between Artificial Neural Network (ANN) and Biological Neural Network (BNN), let us take a look at the similarities based on the terminology between these two.

Biological Neural Network (BNN)

Artificial Neural Network (ANN)

Soma

Node

Dendrites

Input

Synapse

Weights or Interconnections

Axon

Output

下表显示了根据一些所述标准对 ANN 和 BNN 之间的比较。

The following table shows the comparison between ANN and BNN based on some criteria mentioned.

Criteria

BNN

ANN

Processing

Massively parallel, slow but superior than ANN

Massively parallel, fast but inferior than BNN

Size

1011 neurons and 1015 interconnections

102 to 104 nodes (mainly depends on the type of application and network designer)

Learning

They can tolerate ambiguity

Very precise, structured and formatted data is required to tolerate ambiguity

Fault tolerance

Performance degrades with even partial damage

It is capable of robust performance, hence has the potential to be fault tolerant

Storage capacity

Stores the information in the synapse

Stores the information in continuous memory locations

Model of Artificial Neural Network

下图表示 ANN 的通用模型,后面紧跟着其处理过程。

The following diagram represents the general model of ANN followed by its processing.

model

对于人工神经网络的上述通用模型,净输入可以计算如下 -

For the above general model of artificial neural network, the net input can be calculated as follows −

y_in:x_1.w_1 :x_2.w_2 :x_3.w_3 :x_m.w_m

y_{in}\:=\:x_{1}.w_{1}\:+\:x_{2}.w_{2}\:+\:x_{3}.w_{3}\:\dotso\: x_{m}.w_{m}

即净输入y_in:∑_i^m:x_i . w_i

i.e., Net input $y_{in}\:=\:\sum_i^m\:x_{i}.w_{i}$

输出可以通过对净输入应用激活函数来计算。

The output can be calculated by applying the activation function over the net input.

Y:F(y_in)

Y\:=\:F(y_{in})

输出 = 函数(计算出的净输入)

Output = function (net input calculated)

Artificial Neural Network - Building Blocks

ANN 的处理取决于以下三个组成部分 -

Processing of ANN depends upon the following three building blocks −

  1. Network Topology

  2. Adjustments of Weights or Learning

  3. Activation Functions

在本章中,我们将详细讨论人工神经网络的这三个组成部分

In this chapter, we will discuss in detail about these three building blocks of ANN

Network Topology

网络拓扑是网络及其节点和连接线的排列。根据拓扑结构,人工神经网络可分类为以下几种:

A network topology is the arrangement of a network along with its nodes and connecting lines. According to the topology, ANN can be classified as the following kinds −

Feedforward Network

它是一个非循环网络,在各层具有处理单元/节点,并且一层中的所有节点都与前一层的节点连接。连接具有不同的权重。没有反馈回路意味着信号只能从输入到输出单向流动。它可以分为以下两种类型:

It is a non-recurrent network having processing units/nodes in layers and all the nodes in a layer are connected with the nodes of the previous layers. The connection has different weights upon them. There is no feedback loop means the signal can only flow in one direction, from input to output. It may be divided into the following two types −

  1. Single layer feedforward network − The concept is of feedforward ANN having only one weighted layer. In other words, we can say the input layer is fully connected to the output layer.

single layer feedforward network
  1. Multilayer feedforward network − The concept is of feedforward ANN having more than one weighted layer. As this network has one or more layers between the input and the output layer, it is called hidden layers.

multilayer feedforward network

Feedback Network

顾名思义,反馈网络具有反馈路径,这意味着信号可以使用环路双向流动。这使其成为一个非线性动态系统,不断变化,直到达到平衡状态。它可以分为以下类型:

As the name suggests, a feedback network has feedback paths, which means the signal can flow in both directions using loops. This makes it a non-linear dynamic system, which changes continuously until it reaches a state of equilibrium. It may be divided into the following types −

  1. Recurrent networks − They are feedback networks with closed loops. Following are the two types of recurrent networks.

  2. Fully recurrent network − It is the simplest neural network architecture because all nodes are connected to all other nodes and each node works as both input and output.

fully recurrent network
  1. Jordan network − It is a closed loop network in which the output will go to the input again as feedback as shown in the following diagram.

jordan network

Adjustments of Weights or Learning

在人工神经网络中学习是修改指定网络中神经元之间连接权重的方法。人工神经网络学习可以分为三大类,即监督学习、无监督学习和强化学习。

Learning, in artificial neural network, is the method of modifying the weights of connections between the neurons of a specified network. Learning in ANN can be classified into three categories namely supervised learning, unsupervised learning, and reinforcement learning.

Supervised Learning

顾名思义,此类学习是在老师的监督下进行的。这一学习过程是依赖的。

As the name suggests, this type of learning is done under the supervision of a teacher. This learning process is dependent.

在有监督学习下的人工神经网络训练期间,输入向量将呈现给网络,这将产生一个输出向量。该输出向量将与期望输出向量进行比较。如果实际输出和期望输出向量之间有差异,则会生成一个误差信号。在此误差信号的基础上,将调整权重,直到实际输出与期望输出匹配。

During the training of ANN under supervised learning, the input vector is presented to the network, which will give an output vector. This output vector is compared with the desired output vector. An error signal is generated, if there is a difference between the actual output and the desired output vector. On the basis of this error signal, the weights are adjusted until the actual output is matched with the desired output.

supervised learning

Unsupervised Learning

顾名思义,此类学习是在不经老师监督的情况下进行的。这一学习过程是独立的。

As the name suggests, this type of learning is done without the supervision of a teacher. This learning process is independent.

在无监督学习下的人工神经网络训练期间,将类似类型的输入向量组合在一起以形成集群。当应用新的输入模式时,神经网络将产生输出响应,指示输入模式所属的类别。

During the training of ANN under unsupervised learning, the input vectors of similar type are combined to form clusters. When a new input pattern is applied, then the neural network gives an output response indicating the class to which the input pattern belongs.

该过程中没有来自环境的反馈,表明什么应该是期望输出以及该输出是否正确或不正确。因此,在该类型的学习中,网络本身必须从输入数据中发现模式和特征,以及输入数据与输出之间的关系。

There is no feedback from the environment as to what should be the desired output and if it is correct or incorrect. Hence, in this type of learning, the network itself must discover the patterns and features from the input data, and the relation for the input data over the output.

unsupervised learning

Reinforcement Learning

顾名思义,此类学习用于加强或增强网络中一些批判性信息。这一学习过程类似于监督式学习,不过我们可能了解的信息非常少。

As the name suggests, this type of learning is used to reinforce or strengthen the network over some critic information. This learning process is similar to supervised learning, however we might have very less information.

在强化学习下网络训练期间,网络从环境中收到一些反馈。这使其在某种程度上类似于监督式学习。然而,在此处获得的反馈是评价性的,而不是指导性的,这意味着没有监督式学习中的老师。收到反馈后,网络将调整权重以在将来获取更佳的批判性信息。

During the training of network under reinforcement learning, the network receives some feedback from the environment. This makes it somewhat similar to supervised learning. However, the feedback obtained here is evaluative not instructive, which means there is no teacher as in supervised learning. After receiving the feedback, the network performs adjustments of the weights to get better critic information in future.

reinforcement learning

Activation Functions

可以将它定义为施加在输入上的额外力或努力,以获得精确的输出。在人工神经网络中,我们还可以在输入上应用激活函数以获得精确的输出。以下是一些有趣的激活函数:

It may be defined as the extra force or effort applied over the input to obtain an exact output. In ANN, we can also apply activation functions over the input to get the exact output. Followings are some activation functions of interest −

Linear Activation Function

它也被称为恒等函数,因为它不执行输入编辑。它可以定义为-

It is also called the identity function as it performs no input editing. It can be defined as −

F(x)\:=\:x

Sigmoid Activation Function

它有两种类型,如下所示 -

It is of two type as follows −

  1. Binary sigmoidal function − This activation function performs input editing between 0 and 1. It is positive in nature. It is always bounded, which means its output cannot be less than 0 and more than 1. It is also strictly increasing in nature, which means more the input higher would be the output. It can be defined as F(x)\:=\:sigm(x)\:=\:\frac{1}{1\:+\:exp(-x)}

  2. Bipolar sigmoidal function − This activation function performs input editing between -1 and 1. It can be positive or negative in nature. It is always bounded, which means its output cannot be less than -1 and more than 1. It is also strictly increasing in nature like sigmoid function. It can be defined as F(x)\:=\:sigm(x)\:=\:\frac{2}{1\:+\:exp(-x)}\:-\:1\:=\:\frac{1\:-\:exp(x)}{1\:+\:exp(x)}

Learning and Adaptation

如前所述,人工神经网络完全受到生物神经系统(即人脑工作方式)的启发。人脑最令人印象深刻的特点是学习,因此人工神经网络也获取了相同的特征。

As stated earlier, ANN is completely inspired by the way biological nervous system, i.e. the human brain works. The most impressive characteristic of the human brain is to learn, hence the same feature is acquired by ANN.

What Is Learning in ANN?

从本质上说,学习意味着根据环境的变化进行调整和适应。人工神经网络是一个复杂的系统,更准确地说,我们可以说它是一个复杂的适应系统,它可以根据通过它的信息来改变其内部结构。

Basically, learning means to do and adapt the change in itself as and when there is a change in environment. ANN is a complex system or more precisely we can say that it is a complex adaptive system, which can change its internal structure based on the information passing through it.

Why Is It important?

作为一种复杂的适应系统,人工神经网络中的学习意味着处理单元能够由于环境的变化而改变其输入/输出行为。当构建特定网络时,人工神经网络中学习的重要性会增加,因为激活函数是固定的,并且输入/输出向量也是固定的。现在要改变输入/输出行为,我们需要调整权重。

Being a complex adaptive system, learning in ANN implies that a processing unit is capable of changing its input/output behavior due to the change in environment. The importance of learning in ANN increases because of the fixed activation function as well as the input/output vector, when a particular network is constructed. Now to change the input/output behavior, we need to adjust the weights.

Classification

可以将其定义为通过发现同一类别的样本之间的共同特征,对样本数据进行区分的学习过程。例如,要执行人工神经网络的训练,我们有一些具有唯一特征的训练样本,并且要执行其测试,我们有一些具有其他唯一特征的测试样本。分类是监督学习的一个示例。

It may be defined as the process of learning to distinguish the data of samples into different classes by finding common features between the samples of the same classes. For example, to perform training of ANN, we have some training samples with unique features, and to perform its testing we have some testing samples with other unique features. Classification is an example of supervised learning.

Neural Network Learning Rules

我们知道,在人工神经网络学习期间,要改变输入/输出行为,我们需要调整权重。因此,需要使用一种方法,借助这种方法可以改变权重。这些方法称为学习规则,它们只是算法或方程。以下是一些神经网络的学习规则 -

We know that, during ANN learning, to change the input/output behavior, we need to adjust the weights. Hence, a method is required with the help of which the weights can be modified. These methods are called Learning rules, which are simply algorithms or equations. Following are some learning rules for the neural network −

Hebbian Learning Rule

这条规则是最古老、最简单的规则之一,是由唐纳德·赫布在 1949 年出版的《行为的组织》一书中提出的。它是一种前馈、无监督学习。

This rule, one of the oldest and simplest, was introduced by Donald Hebb in his book The Organization of Behavior in 1949. It is a kind of feed-forward, unsupervised learning.

Basic Concept - 此规则基于赫布提出的建议,他写道-

Basic Concept − This rule is based on a proposal given by Hebb, who wrote −

"如果细胞 A 的轴突足够接近于激发细胞 B,并且反复或持续地参与激发细胞 B,那么在其中一个或两个细胞中就会发生某种生长过程或代谢变化,使得 A 作为激发 B 的细胞之一的效率会提高。"

“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”

从以上假设可以得出结论,如果神经元同时激发,那么两个神经元之间的连接可能会加强,如果它们在不同时间激发,那么可能会减弱。

From the above postulate, we can conclude that the connections between two neurons might be strengthened if the neurons fire at the same time and might weaken if they fire at different times.

Mathematical Formulation - 根据赫布学习规则,以下是用于在每个时间步长时增加连接权重的公式。

Mathematical Formulation − According to Hebbian learning rule, following is the formula to increase the weight of connection at every time step.

\Delta w_{ji}(t)\:=\:\alpha x_{i}(t).y_{j}(t)

这里,$\Delta w_{ji}(t)$ ⁡= 时间步长 t 时连接权重增加的增量

Here, $\Delta w_{ji}(t)$ ⁡= increment by which the weight of connection increases at time step t

$\alpha$ = 正的常数学习率

$\alpha$ = the positive and constant learning rate

$x_{i}(t)$ = 时间步长 t 时来自突触前神经元的输入值

$x_{i}(t)$ = the input value from pre-synaptic neuron at time step t

$y_i(t)$= 同一时间步长的突触前神经元输出 t

$y_{i}(t)$ = the output of pre-synaptic neuron at same time step t

Perceptron Learning Rule

此规则是 Rosenblatt 引入的具有线性激活函数的单层前馈网络监督学习算法的纠错。

This rule is an error correcting the supervised learning algorithm of single layer feedforward networks with linear activation function, introduced by Rosenblatt.

Basic Concept − 由于其本质上是监督性的,因此,要计算误差,必须比较期望/目标输出与实际输出。如果发现有任何差异,则必须更改连接的权重。

Basic Concept − As being supervised in nature, to calculate the error, there would be a comparison between the desired/target output and the actual output. If there is any difference found, then a change must be made to the weights of connection.

Mathematical Formulation − 为了解释其数学公式,假设我们有 “n” 个有限输入向量 x(n) 及其期望/目标输出向量 t(n),其中 n = 1 至 N。

Mathematical Formulation − To explain its mathematical formulation, suppose we have ‘n’ number of finite input vectors, x(n), along with its desired/target output vector t(n), where n = 1 to N.

现在,可以基于净输入计算输出 “y”,如前文所述,表示为净输入上应用的激活函数,如下所示 −

Now the output ‘y’ can be calculated, as explained earlier on the basis of the net input, and activation function being applied over that net input can be expressed as follows −

y\:= f(y_{in})\:= \begin{cases}1, &y_{in}> \theta \\\0, &y_{in} \leqslant \theta\end{cases}

y\:=\:f(y_{in})\:=\:\begin{cases}1, & y_{in}\:>\:\theta \\0, & y_{in}\:\leqslant\:\theta\end{cases}

其中 θ 是阈值。

Where θ is threshold.

可以在以下两种情况下更新权重 −

The updating of weight can be done in the following two cases −

Case I − 当 t ≠ y 时,

Case I − when t ≠ y, then

w(new)\:= w(old)+tx

w(new)\:=\:w(old)\:+\;tx

Case II − 当 t = y 时,

Case II − when t = y, then

不更改权重

No change in weight

Delta Learning Rule (Widrow-Hoff Rule)

由 Bernard Widrow 和 Marcian Hoff 引入,也称为最小均方 (LMS) 方法,以最大限度地减少所有训练模式上的误差。这是一种带有连续激活函数的监督学习算法。

It is introduced by Bernard Widrow and Marcian Hoff, also called Least Mean Square (LMS) method, to minimize the error over all training patterns. It is kind of supervised learning algorithm with having continuous activation function.

Basic Concept − 此规则的基础是梯度下降法,该方法会一直持续下去。Delta 规则更新突触权重,以最大程度地减少输出单元的净输入和目标值。

Basic Concept − The base of this rule is gradient-descent approach, which continues forever. Delta rule updates the synaptic weights so as to minimize the net input to the output unit and the target value.

Mathematical Formulation − 要更新突触权重,Delta 规则由以下内容给出

Mathematical Formulation − To update the synaptic weights, delta rule is given by

\Delta w_{i}\:= \alpha.x_{i}.e_{j}

\Delta w_{i}\:=\:\alpha\:.x_{i}.e_{j}

其中 $\Delta w_{i}$ = 第 i ⁡ 个模式的权重变化;

Here $\Delta w_{i}$ = weight change for ith ⁡pattern;

$\alpha$ = 正的常量学习率;

$\alpha$ = the positive and constant learning rate;

$x_{i}$ = 突触前神经元的输入值;

$x_{i}$ = the input value from pre-synaptic neuron;

$e_{j}$ = $(t\:-\:y_{in})$,理想/目标输出与实际输出 ⁡$y_{in}$ 之间的差值

$e_{j}$ = $(t\:-\:y_{in})$, the difference between the desired/target output and the actual output ⁡$y_{in}$

上述 delta 规则仅适用于单输出单元。

The above delta rule is for a single output unit only.

可以在以下两种情况下更新权重 −

The updating of weight can be done in the following two cases −

Case-I − 当 t ≠ y 时,则

Case-I − when t ≠ y, then

w(new)\:=\:w(old)\:+\:\Delta w

Case-II − 当 t = y 时,则

Case-II − when t = y, then

不更改权重

No change in weight

Competitive Learning Rule (Winner-takes-all)

它涉及无监督训练,其中输出节点试图彼此竞争以表示输入模式。要理解此学习规则,我们必须理解竞争网络,如下所示 −

It is concerned with unsupervised training in which the output nodes try to compete with each other to represent the input pattern. To understand this learning rule, we must understand the competitive network which is given as follows −

Basic Concept of Competitive Network − 此网络就像一个带有输出之间反馈连接的单层前馈网络。输出之间的连接是抑制类型,用虚线表示,这意味着竞争者从不自我支持。

Basic Concept of Competitive Network − This network is just like a single layer feedforward network with feedback connection between outputs. The connections between outputs are inhibitory type, shown by dotted lines, which means the competitors never support themselves.

competitive network

Basic Concept of Competitive Learning Rule − 如前所述,输出节点之间会存在竞争。因此,训练中的主要概念是,对于给定的输入模式,激活程度最高的输出单元将被宣布为获胜者。此规则也称为优胜者全得,因为只有获胜的神经元会更新,其余神经元保持不变。

Basic Concept of Competitive Learning Rule − As said earlier, there will be a competition among the output nodes. Hence, the main concept is that during training, the output unit with the highest activation to a given input pattern, will be declared the winner. This rule is also called Winner-takes-all because only the winning neuron is updated and the rest of the neurons are left unchanged.

Mathematical formulation − 以下是在这个学习规则中重要的三个数学表述因素 −

Mathematical formulation − Following are the three important factors for mathematical formulation of this learning rule −

  1. Condition to be a winner − Suppose if a neuron $y_{k}$⁡ ⁡wants to be the winner then there would be the following condition − y_{k}\:=\:\begin{cases}1 & if\:v_{k}\:>\:v_{j}\:for\:all\:j,\:j\:\neq\:k\\0 & otherwise\end{cases}

这意味着如果某个神经元,比如说 $y_{k}$⁡ ,想获胜,那么它的局部感应场(求和单元的输出),比如说 $v_{k}$,必须在网络中所有其他神经元中最大。

It means that if any neuron, say $y_{k}$⁡ , wants to win, then its induced local field (the output of summation unit), say $v_{k}$, must be the largest among all the other neurons in the network.

  1. Condition of sum total of weight − Another constraint over the competitive learning rule is, the sum total of weights to a particular output neuron is going to be 1. For example, if we consider neuron k then − \displaystyle\sum\limits_{j}w_{kj}\:=\:1\:\:\:\:\:\:\:\:\:for\:all\:k

  2. Change of weight for winner − If a neuron does not respond to the input pattern, then no learning takes place in that neuron. However, if a particular neuron wins, then the corresponding weights are adjusted as follows \Delta w_{kj}\:=\:\begin{cases}-\alpha(x_{j}\:-\:w_{kj}), & if\:neuron\:k\:wins\\0, & if\:neuron\:k\:losses\end{cases}

这里 $\alpha$ 是学习率。

Here $\alpha$ is the learning rate.

这清楚地表明,我们通过调整权重来支持获胜的神经元,如果神经元失败,我们就不必费事重新调整其权重了。

This clearly shows that we are favoring the winning neuron by adjusting its weight and if there is a neuron loss, then we need not bother to re-adjust its weight.

Outstar Learning Rule

格罗斯伯格引入的这一规则与监督式学习有关,因为已知期望输出。它也被称为格罗斯伯格学习。

This rule, introduced by Grossberg, is concerned with supervised learning because the desired outputs are known. It is also called Grossberg learning.

Basic Concept − 该规则应用于按层排列的神经元。它经过特别设计,可产生 p 神经元的 d 层的期望输出。

Basic Concept − This rule is applied over the neurons arranged in a layer. It is specially designed to produce a desired output d of the layer of p neurons.

Mathematical Formulation − 可以在这一规则中计算权重调整,如下所示

Mathematical Formulation − The weight adjustments in this rule are computed as follows

\Delta w_{j}\:=\:\alpha\:(d\:-\:w_{j})

此处 d 是期望的神经元输出,$\alpha$ 是学习率。

Here d is the desired neuron output and $\alpha$ is the learning rate.

Supervised Learning

顾名思义,{s0} 在教师的监督下进行。此学习过程是相依的。在受监督学习中训练 ANN 期间,将输入向量提供给网络,该网络会产生一个输出向量。将该输出向量与所需/目标输出向量进行比较。如果实际输出和所需的/目标输出向量之间存在差异,则会生成一个误差信号。在该误差信号的基础上,将调整权重,直到实际输出与所需的输出匹配。

As the name suggests, supervised learning takes place under the supervision of a teacher. This learning process is dependent. During the training of ANN under supervised learning, the input vector is presented to the network, which will produce an output vector. This output vector is compared with the desired/target output vector. An error signal is generated if there is a difference between the actual output and the desired/target output vector. On the basis of this error signal, the weights would be adjusted until the actual output is matched with the desired output.

Perceptron

由 Frank Rosenblatt 使用 McCulloch 和 Pitts 模型开发的感知器是人工神经网络的基本操作单元。它使用受监督的学习规则,并且能够将数据分类为两个类。

Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic operational unit of artificial neural networks. It employs supervised learning rule and is able to classify the data into two classes.

感知器的操作特征:它包含一个神经元,该神经元具有任意数量的输入和可调整权重,但神经元的输出根据阈值确定为 1 或 0。它还包含一个偏差,其权重始终为 1。下图给出了感知器的示意图表示。

Operational characteristics of the perceptron: It consists of a single neuron with an arbitrary number of inputs along with adjustable weights, but the output of the neuron is 1 or 0 depending upon the threshold. It also consists of a bias whose weight is always 1. Following figure gives a schematic representation of the perceptron.

perceptron

因此,感知器具有以下三个基本元素 −

Perceptron thus has the following three basic elements −

  1. Links − It would have a set of connection links, which carries a weight including a bias always having weight 1.

  2. Adder − It adds the input after they are multiplied with their respective weights.

  3. Activation function − It limits the output of neuron. The most basic activation function is a Heaviside step function that has two possible outputs. This function returns 1, if the input is positive, and 0 for any negative input.

Training Algorithm

感知器网络可以针对单输出单元以及多个输出单元进行训练。

Perceptron network can be trained for single output unit as well as multiple output units.

Training Algorithm for Single Output Unit

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Bias

  3. Learning rate $\alpha$

为了便于计算和简化,权重和偏差必须设置为 0,而学习率必须设置为 1。

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1.

Step 2 − 当停止条件不成立时,继续执行步骤 3-8。

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − 对每个训练向量 x 继续执行步骤 4-6。

Step 3 − Continue step 4-6 for every training vector x.

Step 4 − 如下激活每个输入单元 −

Step 4 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 5 − 现在,使用以下关系获取净输入 −

Step 5 − Now obtain the net input with the following relation −

y_{in}\:=\:b\:+\:\displaystyle\sum\limits_{i}^n x_{i}.\:w_{i}

其中 ‘b’ 为偏差, ‘n’ 为输入神经元的总数。

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − 应用以下激活函数获取最终输出。

Step 6 − Apply the following activation function to obtain the final output.

f(y_{in})\:=\:\begin{cases}1 & if\:y_{in}\:>\:\theta\\0 & if \: -\theta\:\leqslant\:y_{in}\:\leqslant\:\theta\\-1 & if\:y_{in}\:<\:-\theta \end{cases}

Step 7 − 如下调整权重和偏差 −

Step 7 − Adjust the weight and bias as follows −

Case 1 − 如果 y ≠ t ,那么

Case 1 − if y ≠ t then,

w_{i}(new)\:=\:w_{i}(old)\:+\:\alpha\:tx_{i}

b(new)\:=\:b(old)\:+\:\alpha t

Case 2 − 如果 y = t ,那么

Case 2 − if y = t then,

w_{i}(new)\:=\:w_{i}(old)

b(new)\:=\:b(old)

其中 ‘y’ 为实际输出, ‘t’ 为期望/目标输出。

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Step 8 − 测试停止条件,即当权重无变化时发生。

Step 8 − Test for the stopping condition, which would happen when there is no change in weight.

Training Algorithm for Multiple Output Units

以下图表为多输出类别的感知器的架构。

The following diagram is the architecture of perceptron for multiple output classes.

training algorithm

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Bias

  3. Learning rate $\alpha$

为了便于计算和简化,权重和偏差必须设置为 0,而学习率必须设置为 1。

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1.

Step 2 − 当停止条件不成立时,继续执行步骤 3-8。

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − 对每个训练向量 x 继续执行步骤 4-6。

Step 3 − Continue step 4-6 for every training vector x.

Step 4 − 如下激活每个输入单元 −

Step 4 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 5 - 使用以下关系来获得净输入 -

Step 5 − Obtain the net input with the following relation −

y_{in}\:=\:b\:+\:\displaystyle\sum\limits_{i}^n x_{i}\:w_{ij}

其中 ‘b’ 为偏差, ‘n’ 为输入神经元的总数。

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 - 应用以下激活函数来获取每个输出单元的最终输出 j = 1 to m -

Step 6 − Apply the following activation function to obtain the final output for each output unit j = 1 to m

f(y_{in})\:=\:\begin{cases}1 & if\:y_{inj}\:>\:\theta\\0 & if \: -\theta\:\leqslant\:y_{inj}\:\leqslant\:\theta\\-1 & if\:y_{inj}\:<\:-\theta \end{cases}

Step 7 - 如下方式为 x = 1 to nj = 1 to m 调整权值和偏差 -

Step 7 − Adjust the weight and bias for x = 1 to n and j = 1 to m as follows −

Case 1 - 如果 yj ≠ tj 则,

Case 1 − if yj ≠ tj then,

w_{ij}(new)\:=\:w_{ij}(old)\:+\:\alpha\:t_{j}x_{i}

b_{j}(new)\:=\:b_{j}(old)\:+\:\alpha t_{j}

Case 2 - 如果 yj = tj 则,

Case 2 − if yj = tj then,

w_{ij}(new)\:=\:w_{ij}(old)

b_{j}(new)\:=\:b_{j}(old)

其中 ‘y’ 为实际输出, ‘t’ 为期望/目标输出。

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Step 8 - 当权值没有改变时,测试终止条件。

Step 8 − Test for the stopping condition, which will happen when there is no change in weight.

Adaptive Linear Neuron (Adaline)

Adaline 的全称为自适应线性神经元,是一个具有单个线性单元的神经网络。它是由 Widrow 和 Hoff 在 1960 年开发的。关于 Adaline 的一些重要事项如下 -

Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit. It was developed by Widrow and Hoff in 1960. Some important points about Adaline are as follows −

  1. It uses bipolar activation function.

  2. It uses delta rule for training to minimize the Mean-Squared Error (MSE) between the actual output and the desired/target output.

  3. The weights and the bias are adjustable.

Architecture

Adaline 的基本结构类似于感知器,具有一个额外的反馈回路,可借助它将实际输出与期望/目标输出进行比较。在基于训练算法进行比较后,权值和偏差将被更新。

The basic structure of Adaline is similar to perceptron having an extra feedback loop with the help of which the actual output is compared with the desired/target output. After comparison on the basis of training algorithm, the weights and bias will be updated.

architecture adaptive linear

Training Algorithm

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Bias

  3. Learning rate $\alpha$

为了便于计算和简化,权重和偏差必须设置为 0,而学习率必须设置为 1。

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1.

Step 2 − 当停止条件不成立时,继续执行步骤 3-8。

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 - 对每个双极训练对继续进行步骤 4-6 s:t

Step 3 − Continue step 4-6 for every bipolar training pair s:t.

Step 4 − 如下激活每个输入单元 −

Step 4 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 5 - 使用以下关系来获得净输入 -

Step 5 − Obtain the net input with the following relation −

y_{in}\:=\:b\:+\:\displaystyle\sum\limits_{i}^n x_{i}\:w_{i}

其中 ‘b’ 为偏差, ‘n’ 为输入神经元的总数。

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − 使用以下激活函数获得最终输出 −

Step 6 − Apply the following activation function to obtain the final output −

f(y_{in})\:=\:\begin{cases}1 & 若 y_{in}\:\geqslant\:0 \\-1 & 若 y_{in}\:<\:0 \end{cases}

f(y_{in})\:=\:\begin{cases}1 & if\:y_{in}\:\geqslant\:0 \\-1 & if\:y_{in}\:<\:0 \end{cases}

Step 7 − 如下调整权重和偏差 −

Step 7 − Adjust the weight and bias as follows −

Case 1 − 如果 y ≠ t ,那么

Case 1 − if y ≠ t then,

w_{i}(new)\:=\:w_{i}(old)\:+\: \alpha(t\:-\:y_{in})x_{i}

b(new)\:=\:b(old)\:+\: \alpha(t\:-\:y_{in})

Case 2 − 如果 y = t ,那么

Case 2 − if y = t then,

w_{i}(new)\:=\:w_{i}(old)

b(new)\:=\:b(old)

其中 ‘y’ 为实际输出, ‘t’ 为期望/目标输出。

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

$(t\:-\;y_{in})$ 为计算所得的误差。

$(t\:-\;y_{in})$ is the computed error.

Step 8 − 测试停止条件,当权重没有变化或训练过程中的最高权重变化小于指定容差时会发生。

Step 8 − Test for the stopping condition, which will happen when there is no change in weight or the highest weight change occurred during training is smaller than the specified tolerance.

Multiple Adaptive Linear Neuron (Madaline)

Madaline 即多级自适应线性神经元,是一个由多个以并行方式连接的 Adaline 神经元组成的网络。它将只有一个输出单元。以下为关于 Madaline 的一些重要要点:

Madaline which stands for Multiple Adaptive Linear Neuron, is a network which consists of many Adalines in parallel. It will have a single output unit. Some important points about Madaline are as follows −

  1. It is just like a multilayer perceptron, where Adaline will act as a hidden unit between the input and the Madaline layer.

  2. The weights and the bias between the input and Adaline layers, as in we see in the Adaline architecture, are adjustable.

  3. The Adaline and Madaline layers have fixed weights and bias of 1.

  4. Training can be done with the help of Delta rule.

Architecture

Madaline 的架构包含 “n” 个输入层神经元、 “m” 个 Adaline 层神经元和 1 个 Madaline 层神经元。由于 Adaline 层位于输入层和输出层(即 Madaline 层)之间,因此可以将其视为一个隐藏层。

The architecture of Madaline consists of “n” neurons of the input layer, “m” neurons of the Adaline layer, and 1 neuron of the Madaline layer. The Adaline layer can be considered as the hidden layer as it is between the input layer and the output layer, i.e. the Madaline layer.

adaline

Training Algorithm

目前我们已经知道,只有输入层与 Adaline 层之间的权重和偏差需要调节,而 Adaline 层与 Madaline 层之间的权重和偏差是固定的。

By now we know that only the weights and bias between the input and the Adaline layer are to be adjusted, and the weights and bias between the Adaline and the Madaline layer are fixed.

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Bias

  3. Learning rate $\alpha$

为了便于计算和简化,权重和偏差必须设置为 0,而学习率必须设置为 1。

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1.

Step 2 − 当停止条件不成立时,继续执行步骤 3-8。

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − 针对每对双极训练数据执行步骤 4-7 s:t

Step 3 − Continue step 4-7 for every bipolar training pair s:t.

Step 4 − 如下激活每个输入单元 −

Step 4 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 5 − 使用以下关系获取每个隐藏层(即 Adaline 层)的净输入:

Step 5 − Obtain the net input at each hidden layer, i.e. the Adaline layer with the following relation −

Q_{inj}\:=\:b_{j}\:+\:\displaystyle\sum\limits_{i}^n x_{i}\:w_{ij}\:\:\:j\:=\:1\:to\:m

其中 ‘b’ 为偏差, ‘n’ 为输入神经元的总数。

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − 使用以下激活函数获取 Adaline 层和 Madaline 层的最终输出:

Step 6 − Apply the following activation function to obtain the final output at the Adaline and the Madaline layer −

f(x)\:=\:\begin{cases}1 & 若 x\:\geqslant\:0 \\-1 & 若 x\:<\:0 \end{cases}

f(x)\:=\:\begin{cases}1 & if\:x\:\geqslant\:0 \\-1 & if\:x\:<\:0 \end{cases}

隐层(Adaline)输出单元的输出

Output at the hidden (Adaline) unit

Q_{j}\:=\:f(Q_{inj})

网络最终输出

Final output of the network

y\:=\:f(y_{in})

i.e. $\:\:y_{inj}\:=\:b_{0}\:+\:\sum_{j = 1}^m\:Q_{j}\:v_{j}$

i.e. $\:\:y_{inj}\:=\:b_{0}\:+\:\sum_{j = 1}^m\:Q_{j}\:v_{j}$

Step 7 − 按如下方式计算误差并调整权重 −

Step 7 − Calculate the error and adjust the weights as follows −

Case 1 − 如果 y ≠ tt = 1 则,

Case 1 − if y ≠ t and t = 1 then,

w_{ij}(new)\:=\:w_{ij}(old)\:+\: \alpha(1\:-\:Q_{inj})x_{i}

b_{j}(new)\:=\:b_{j}(old)\:+\: \alpha(1\:-\:Q_{inj})

在这种情况下,权重将会在 Qj 处更新,因为 t = 1 ,净输入接近 0。

In this case, the weights would be updated on Qj where the net input is close to 0 because t = 1.

Case 2 − 如果 y ≠ tt = -1 则,

Case 2 − if y ≠ t and t = -1 then,

w_{ik}(new)\:=\:w_{ik}(old)\:+\: \alpha(-1\:-\:Q_{ink})x_{i}

b_{k}(new)\:=\:b_{k}(old)\:+\: \alpha(-1\:-\:Q_{ink})

在这种情况下,权重将会在 Qk 处更新,因为 t = -1 ,净输入为正。

In this case, the weights would be updated on Qk where the net input is positive because t = -1.

其中 ‘y’ 为实际输出, ‘t’ 为期望/目标输出。

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Case 3 − 如果 y = t 则,

Case 3 − if y = t then

权重不会发生任何变化。

There would be no change in weights.

Step 8 − 测试停止条件,当权重没有变化或训练过程中的最高权重变化小于指定容差时会发生。

Step 8 − Test for the stopping condition, which will happen when there is no change in weight or the highest weight change occurred during training is smaller than the specified tolerance.

Back Propagation Neural Networks

反向传播神经(BPN)是一个多层次神经网络,包含输入层、至少一个隐层和输出层。如同它的名称所暗示的,这种网络会发生反向传播。在输出层计算的误差,通过比较目标输出和实际输出,将会反向传播到输入层。

Back Propagation Neural (BPN) is a multilayer neural network consisting of the input layer, at least one hidden layer and output layer. As its name suggests, back propagating will take place in this network. The error which is calculated at the output layer, by comparing the target output and the actual output, will be propagated back towards the input layer.

Architecture

如该图表所示,BPN 的架构具有三个相互连接的层,在这些层上具有权重。隐层和输出层也具有偏置,其权重始终为 1。显然,BPN 的工作分为两个阶段。一个阶段将信号从输入层传送到输出层,另一个阶段将误差从输出层反向传播到输入层。

As shown in the diagram, the architecture of BPN has three interconnected layers having weights on them. The hidden layer as well as the output layer also has bias, whose weight is always 1, on them. As is clear from the diagram, the working of BPN is in two phases. One phase sends the signal from the input layer to the output layer, and the other phase back propagates the error from the output layer to the input layer.

back propogation

Training Algorithm

对于训练,BPN 将使用二进制 S 形激活函数。BPN 的训练将具有如下三个阶段。

For training, BPN will use binary sigmoid activation function. The training of BPN will have the following three phases.

  1. Phase 1 − Feed Forward Phase

  2. Phase 2 − Back Propagation of error

  3. Phase 3 − Updating of weights

所有这些步骤将在算法中按如下所示总结

All these steps will be concluded in the algorithm as follows

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Learning rate $\alpha$

为方便计算和简便起见,采用一些较小的随机值。

For easy calculation and simplicity, take some small random values.

Step 2 − 当停止条件不成立时,继续执行步骤 3-11。

Step 2 − Continue step 3-11 when the stopping condition is not true.

Step 3 − 对于每个训练对,继续执行步骤 4-10。

Step 3 − Continue step 4-10 for every training pair.

Phase 1

Step 4 − 每个输入单元接收输入信号 xi 并将其发送给所有 i = 1 to n 的隐藏单元

Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for all i = 1 to n

Step 5 − 使用以下关系计算隐藏单元上的净输入 −

Step 5 − Calculate the net input at the hidden unit using the following relation −

Q_{inj}\:=\:b_{0j}\:+\:\sum_{i=1}^n x_{i}v_{ij}\:\:\:\:j\:=\:1\:to\:p

其中 b0j 是隐藏单元上的偏差, vij 是来自输入层的 i 单元的隐藏层 j 单元上的权重。

Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming from i unit of the input layer.

现在通过应用以下激活函数来计算净输出

Now calculate the net output by applying the following activation function

Q_{j}\:=\:f(Q_{inj})

将隐藏层单元的这些输出信号发送到输出层单元。

Send these output signals of the hidden layer units to the output layer units.

Step 6 − 使用以下关系计算输出层单元上的净输入 −

Step 6 − Calculate the net input at the output layer unit using the following relation −

y_{ink}\:=\:b_{0k}\:+\:\sum_{j = 1}^p\:Q_{j}\:w_{jk}\:\:k\:=\:1\:to\:m

其中 b0k 是输出单元上的偏差, wjk 是来自隐藏层的 j 单元的输出层 k 单元上的权重。

Here b0k ⁡is the bias on output unit, wjk is the weight on k unit of the output layer coming from j unit of the hidden layer.

通过应用以下激活函数计算净输出

Calculate the net output by applying the following activation function

y_{k}\:=\:f(y_{ink})

Phase 2

Step 7 − 根据在每个输出单元接收到的目标模式,计算误差校正项,如下所示 −

Step 7 − Compute the error correcting term, in correspondence with the target pattern received at each output unit, as follows −

\delta_{k}\:=\:(t_{k}\:-\:y_{k})f^{'}(y_{ink})

在此基础上,按如下所述更新权重和偏差 −

On this basis, update the weight and bias as follows −

\Delta v_{jk}\:=\:\alpha \delta_{k}\:Q_{ij}

\Delta b_{0k}\:=\:\alpha \delta_{k}

然后,将 $\delta_{k}$ 传回隐藏层。

Then, send $\delta_{k}$ back to the hidden layer.

Step 8 − 现在,每个隐藏单元将为其从输出单元的增量输入的总和。

Step 8 − Now each hidden unit will be the sum of its delta inputs from the output units.

\delta_{inj}\:=\:\displaystyle\sum\limits_{k=1}^m \delta_{k}\:w_{jk}

误差项可按如下方式计算 −

Error term can be calculated as follows −

\delta_{j}\:=\:\delta_{inj}f^{'}(Q_{inj})

在此基础上,按如下所述更新权重和偏差 −

On this basis, update the weight and bias as follows −

\Delta w_{ij}\:=\:\alpha\delta_{j}x_{i}

\Delta b_{0j}\:=\:\alpha\delta_{j}

Phase 3

Step 9 − 每个输出单元 (ykk = 1 to m) 按照如下方式更新权重和偏差 −

Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −

v_{jk}(new)\:=\:v_{jk}(old)\:+\:\Delta v_{jk}

b_{0k}(new)\:=\:b_{0k}(old)\:+\:\Delta b_{0k}

Step 10 − 每个输出单元 (zjj = 1 to p) 按照如下方式更新权重和偏差 −

Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −

w_{ij}(new)\:=\:w_{ij}(old)\:+\:\Delta w_{ij}

b_{0j}(new)\:=\:b_{0j}(old)\:+\:\Delta b_{0j}

Step 11 − 检查停止条件,该条件可能是达到时期数或目标输出与实际输出匹配。

Step 11 − Check for the stopping condition, which may be either the number of epochs reached or the target output matches the actual output.

Generalized Delta Learning Rule

Delta 规则仅适用于输出层。另一方面,广义 Delta 规则,也称为 back-propagation 规则,是一种创建隐藏层期望值的方法。

Delta rule works only for the output layer. On the other hand, generalized delta rule, also called as back-propagation rule, is a way of creating the desired values of the hidden layer.

Mathematical Formulation

对于激活函数 $y_{k}\:=\:f(y_{ink})$,隐藏层以及输出层净输入的导数可以给出为

For the activation function $y_{k}\:=\:f(y_{ink})$ the derivation of net input on Hidden layer as well as on output layer can be given by

y_{ink}\:=\:\displaystyle\sum\limits_i\:z_{i}w_{jk}

且 $\:\:y_{inj}\:=\:\sum_i x_{i}v_{ij}$

And $\:\:y_{inj}\:=\:\sum_i x_{i}v_{ij}$

现在要最小化的误差为

Now the error which has to be minimized is

E\:=\:\frac{1}{2}\displaystyle\sum\limits_{k}\:[t_{k}\:-\:y_{k}]^2

使用链式法则,我们得到

By using the chain rule, we have

\frac{\partial E}{\partial w_{jk}}\:=\:\frac{\partial }{\partial w_{jk}}(\frac{1}{2}\displaystyle\sum\limits_{k}\:[t_{k}\:-\:y_{k}]^2)

=\:\frac{\partial }{\partial w_{jk}}\lgroup\frac{1}{2}[t_{k}\:-\:t(y_{ink})]^2\rgroup

=\:-[t_{k}\:-\:y_{k}]\frac{\partial }{\partial w_{jk}}f(y_{ink})

=\:-[t_{k}\:-\:y_{k}]f(y_{ink})\frac{\partial }{\partial w_{jk}}(y_{ink})

=\:-[t_{k}\:-\:y_{k}]f^{'}(y_{ink})z_{j}

现在让我们称 $\delta_{k}\:=\:-[t_{k}\:-\:y_{k}]f^{'}(y_{ink})$

Now let us say $\delta_{k}\:=\:-[t_{k}\:-\:y_{k}]f^{'}(y_{ink})$

连接到隐藏单元 zj 上的权重可给出为 −

The weights on connections to the hidden unit zj can be given by −

\frac{\partial E}{\partial v_{ij}}\:=\:- \displaystyle\sum\limits_{k} \delta_{k}\frac{\partial }{\partial v_{ij}}\:(y_{ink})

代入 $y_{ink}$ 的值将得到以下结果

Putting the value of $y_{ink}$ we will get the following

\delta_{j}\:=\:-\displaystyle\sum\limits_{k}\delta_{k}w_{jk}f^{'}(z_{inj})

权重更新可以如下进行 −

Weight updating can be done as follows −

对于输出单元 −

For the output unit −

\Delta w_{jk}\:=\:-\alpha\frac{\partial E}{\partial w_{jk}}

对于隐藏单元 −

=\:\alpha\:\delta_{k}\:z_{j}

\Delta v_{ij}\:=\:-\alpha\frac{\partial E}{\partial v_{ij}}

For the hidden unit −

=\:\alpha\:\delta_{j}\:x_{i}

\Delta v_{ij}\:=\:-\alpha\frac{\partial E}{\partial v_{ij}}

这类神经网络基于模式关联,这意味着它们能够存储不同的模式,并且在给出输出时,它们可以通过与给定的输入模式进行匹配来生成一个存储的模式。这类内存也称为 Content-Addressable Memory (CAM)。联想式内存将存储的模式作为数据文件进行并行搜索。

=\:\alpha\:\delta_{j}\:x_{i}

Unsupervised Learning

顾名思义,这种类型的学习是在没有老师指导的情况下进行的。这种学习过程是独立的。在非监督学习下训练人工神经网络期间,类似类型的输入向量结合形成簇。当应用一个新的输入模式时,神经网络给出输出响应来指示输入模式所属的类别。在此过程中,不会有来自环境的反馈,说明期望的输出应是什么,以及它是否正确或不正确。因此,在这种类型的学习中,网络本身必须从输入数据中发现模式、特性,以及输入数据与输出之间的关系。

As the name suggests, this type of learning is done without the supervision of a teacher. This learning process is independent. During the training of ANN under unsupervised learning, the input vectors of similar type are combined to form clusters. When a new input pattern is applied, then the neural network gives an output response indicating the class to which input pattern belongs. In this, there would be no feedback from the environment as to what should be the desired output and whether it is correct or incorrect. Hence, in this type of learning the network itself must discover the patterns, features from the input data and the relation for the input data over the output.

Winner-Takes-All Networks

这些类型的网络基于竞争学习规则,并且会使用以下策略:它选择具有最大总输入量的神经元作为获胜者。输出神经元之间的连接展示了它们之间的竞争,其中一个会“开启”,这意味着它会是获胜者,其他会“关闭”。

These kinds of networks are based on the competitive learning rule and will use the strategy where it chooses the neuron with the greatest total inputs as a winner. The connections between the output neurons show the competition between them and one of them would be ‘ON’ which means it would be the winner and others would be ‘OFF’.

以下是使用非监督学习基于此简单概念的一些网络。

Following are some of the networks based on this simple concept using unsupervised learning.

Hamming Network

在使用非监督学习的大多数神经网络中,计算距离并执行比较至关重要。这种类型的网络是汉明网络,对于给定的每个输入向量,它会将其聚类到不同的组中。以下是汉明网络的一些重要特性:

In most of the neural networks using unsupervised learning, it is essential to compute the distance and perform comparisons. This kind of network is Hamming network, where for every given input vectors, it would be clustered into different groups. Following are some important features of Hamming Networks −

  1. Lippmann started working on Hamming networks in 1987.

  2. It is a single layer network.

  3. The inputs can be either binary {0, 1} of bipolar {-1, 1}.

  4. The weights of the net are calculated by the exemplar vectors.

  5. It is a fixed weight network which means the weights would remain the same even during training.

Max Net

这也是一个固定权重网络,它充当一个子网络,用于选择具有最高输入的节点。所有节点都完全互连,并且在所有这些加权互连中都存在对称权重。

This is also a fixed weight network, which serves as a subnet for selecting the node having the highest input. All the nodes are fully interconnected and there exists symmetrical weights in all these weighted interconnections.

Architecture

max net

它使用机制,这是一个迭代过程,每个节点通过连接接收来自所有其他节点的抑制性输入。值最大的单个节点将处于活动状态或获胜状态,所有其他节点的激活将处于非活动状态。Max Net 使用恒等激活函数,其为 f(x)\:=\:\begin{cases}x & if\:x > 0\\0 & if\:x \leq 0\end{cases}

It uses the mechanism which is an iterative process and each node receives inhibitory inputs from all other nodes through connections. The single node whose value is maximum would be active or winner and the activations of all other nodes would be inactive. Max Net uses identity activation function with f(x)\:=\:\begin{cases}x & if\:x > 0\\0 & if\:x \leq 0\end{cases}

该网络的任务由 +1 自激发权重和相互抑制幅度完成,其设置为 [0 < ɛ < $\frac{1}{m}$],其中 “m” 是节点总数。

The task of this net is accomplished by the self-excitation weight of +1 and mutual inhibition magnitude, which is set like [0 < ɛ < $\frac{1}{m}$] where “m” is the total number of the nodes.

Competitive Learning in ANN

它与非监督训练有关,其中输出节点尝试相互竞争以表示输入模式。为了理解这个学习规则,我们必须了解竞争网络,其解释如下:

It is concerned with unsupervised training in which the output nodes try to compete with each other to represent the input pattern. To understand this learning rule we will have to understand competitive net which is explained as follows −

Basic Concept of Competitive Network

该网络就像一个单层前馈网络,在输出之间具有反馈连接。输出之间的连接是抑制型的,由虚线表示,这意味着竞争者永远不会支持自己。

This network is just like a single layer feed-forward network having feedback connection between the outputs. The connections between the outputs are inhibitory type, which is shown by dotted lines, which means the competitors never support themselves.

basic concept

Basic Concept of Competitive Learning Rule

如前所述,输出节点之间存在竞争,因此主要概念是:在训练期间,对给定输入模式具有最高激活度的输出单元将被宣布为获胜者。此规则也称为赢家通吃,因为只有获胜的神经元会被更新,而其余神经元保持不变。

As said earlier, there would be competition among the output nodes so the main concept is - during training, the output unit that has the highest activation to a given input pattern, will be declared the winner. This rule is also called Winner-takes-all because only the winning neuron is updated and the rest of the neurons are left unchanged.

Mathematical Formulation

以下是此学习规则的数学公式的三个重要因素:

Following are the three important factors for mathematical formulation of this learning rule −

  1. Condition to be a winner Suppose if a neuron yk wants to be the winner, then there would be the following condition y_{k}\:=\:\begin{cases}1 & if\:v_{k} > v_{j}\:for\:all\:\:j,\:j\:\neq\:k\\0 & otherwise\end{cases} It means that if any neuron, say, yk wants to win, then its induced local field (the output of the summation unit), say vk, must be the largest among all the other neurons in the network.

  2. Condition of the sum total of weight Another constraint over the competitive learning rule is the sum total of weights to a particular output neuron is going to be 1. For example, if we consider neuron k then \displaystyle\sum\limits_{k} w_{kj}\:=\:1\:\:\:\:for\:all\:\:k

  3. Change of weight for the winner If a neuron does not respond to the input pattern, then no learning takes place in that neuron. However, if a particular neuron wins, then the corresponding weights are adjusted as follows − \Delta w_{kj}\:=\:\begin{cases}-\alpha(x_{j}\:-\:w_{kj}), & if\:neuron\:k\:wins\\0 & if\:neuron\:k\:losses\end{cases} Here $\alpha$ is the learning rate. This clearly shows that we are favoring the winning neuron by adjusting its weight and if a neuron is lost, then we need not bother to re-adjust its weight.

K-means Clustering Algorithm

K-means 是最流行的群集算法之一,其中我们使用分区程序的概念。我们从初始分区开始,并反复将模式从一个群集移动到另一个群集,直到获得满意结果。

K-means is one of the most popular clustering algorithm in which we use the concept of partition procedure. We start with an initial partition and repeatedly move patterns from one cluster to another, until we get a satisfactory result.

Algorithm

Step 1 − 选择 k 个点作为初始质心。初始化 k 个原型 (w1,…,wk) ,例如,我们可以用随机选择的输入向量识别它们 −

Step 1 − Select k points as the initial centroids. Initialize k prototypes (w1,…,wk), for example we can identifying them with randomly chosen input vectors −

W_{j}\:=\:i_{p},\:\:\: where\:j\:\in \lbrace1,…​.,k\rbrace\:and\:p\:\in \lbrace1,…​.,n\rbrace

W_{j}\:=\:i_{p},\:\:\: where\:j\:\in \lbrace1,…​.,k\rbrace\:and\:p\:\in \lbrace1,…​.,n\rbrace

每个群集 Cj 都与原型 wj 相关。

Each cluster Cj is associated with prototype wj.

Step 2 − 重复步骤 3-5,直到 E 不再减小,或群集成员资格不再改变。

Step 2 − Repeat step 3-5 until E no longer decreases, or the cluster membership no longer changes.

Step 3 − 对于每个输入向量 ip ,其中 p ∈ {1,…,n} ,将 ip 放在与最近的原型 wj 具有以下关系的群集 Cj 中:

Step 3 − For each input vector ip where p ∈ {1,…,n}, put ip in the cluster Cj with the nearest prototype wj having the following relation

|i_{p}\:-\:w_{j*}|\:\leq\:|i_{p}\:-\:w_{j}|,\:j\:\in \lbrace1,…​.,k\rbrace

|i_{p}\:-\:w_{j*}|\:\leq\:|i_{p}\:-\:w_{j}|,\:j\:\in \lbrace1,…​.,k\rbrace

Step 4 − 对于每个群集 Cj ,其中 j ∈ { 1,…,k} ,更新原型 wjCj 中当前所有样本的质心,以便:

Step 4 − For each cluster Cj, where j ∈ { 1,…,k}, update the prototype wj to be the centroid of all samples currently in Cj , so that

w_{j}\:=\:\sum_{i_{p}\in C_{j}}\frac{i_{p}}{|C_{j}|}

Step 5 − 计算总量化误差如下 −

Step 5 − Compute the total quantization error as follows −

E\:=\:\sum_{j=1}^k\sum_{i_{p}\in w_{j}}|i_{p}\:-\:w_{j}|^2

Neocognitron

它是一个多层前馈网络,由福岛于 20 世纪 80 年代开发。该模型基于监督学习,用于视觉模式识别,主要是手写字符。它基本上是福岛于 1975 年开发的 Cognitron 网络的扩展。

It is a multilayer feedforward network, which was developed by Fukushima in 1980s. This model is based on supervised learning and is used for visual pattern recognition, mainly hand-written characters. It is basically an extension of Cognitron network, which was also developed by Fukushima in 1975.

Architecture

它是一个层次网络,包含许多层,并且在这些层中局部有连接模式。

It is a hierarchical network, which comprises many layers and there is a pattern of connectivity locally in those layers.

neocognitron

正如我们在上图中看到的,新认知网络被划分为不同的连接层,并且每层有两个细胞。以下是对这些细胞的解释 −

As we have seen in the above diagram, neocognitron is divided into different connected layers and each layer has two cells. Explanation of these cells is as follows −

S-Cell − 它被称为一个简单细胞,经过训练,可以对特定的模式或一组模式做出反应。

S-Cell − It is called a simple cell, which is trained to respond to a particular pattern or a group of patterns.

C-Cell − 它被称为复杂细胞,它将来自 S 细胞的输出结合起来,同时减少了每个阵列中的单元数量。从另一个意义上讲,C 细胞取代了 S 细胞的结果。

C-Cell − It is called a complex cell, which combines the output from S-cell and simultaneously lessens the number of units in each array. In another sense, C-cell displaces the result of S-cell.

Training Algorithm

发现新认知网络的训练逐层进行。训练从输入层到第一层的权重并冻结它们。然后,训练从第一层到第二层的权重,以此类推。S 细胞和 C 细胞之间的内部计算取决于来自前一层的权重。因此,我们可以说训练算法取决于 S 细胞和 C 细胞的计算。

Training of neocognitron is found to be progressed layer by layer. The weights from the input layer to the first layer are trained and frozen. Then, the weights from the first layer to the second layer are trained, and so on. The internal calculations between S-cell and Ccell depend upon the weights coming from the previous layers. Hence, we can say that the training algorithm depends upon the calculations on S-cell and C-cell.

Calculations in S-cell

S 细胞拥有从前一层接收到的兴奋信号并拥有在同一层内获得的抑制信号。

The S-cell possesses the excitatory signal received from the previous layer and possesses inhibitory signals obtained within the same layer.

\theta=\:\sqrt{\sum\sum t_{i} c_{i}^2}

这里, ti 是固定权重, ci 是 C 单元格的输出。

Here, ti is the fixed weight and ci is the output from C-cell.

S 单元格的缩放输入可用下述方式计算:

The scaled input of S-cell can be calculated as follows −

x\:=\:\frac{1\:+\:e}{1\:+\:vw_{0}}\:-\:1

这里,$e\:=\:\sum_i c_{i}w_{i}$

Here, $e\:=\:\sum_i c_{i}w_{i}$

wi 是从 C 单元格到 S 单元格调整的权重。

wi is the weight adjusted from C-cell to S-cell.

w0 是输入与 S 单元格之间可调整的权重。

w0 is the weight adjustable between the input and S-cell.

v 是从 C 单元格的激励性输入。

v is the excitatory input from C-cell.

输出信号的激活为:

The activation of the output signal is,

s\:=\:\begin{cases}x, & if\:x \geq 0\\0, & if\:x < 0\end{cases}

Calculations in C-cell

C 层的净输入是:

The net input of C-layer is

C\:=\:\displaystyle\sum\limits_i s_{i}x_{i}

这里, si 是 S 单元格的输出, xi 是从 S 单元格到 C 单元格的固定权重。

Here, si is the output from S-cell and xi is the fixed weight from S-cell to C-cell.

最终输出如下:

The final output is as follows −

C_{out}\:=\:\begin{cases}\frac{C}{a+C}, & if\:C > 0\\0, & otherwise\end{cases}

这里 ‘a’ 是取决于网络性能的参数。

Here ‘a’ is the parameter that depends on the performance of the network.

Learning Vector Quantization

学习向量量化 (LVQ) 不同于矢量量化 (VQ) 和科霍宁自组织映射 (KSOM),它本质上是一种竞争性网络,使用监督学习。我们可以将其定义为一个对模式进行分类的过程,其中每个输出单元代表一个类别。由于它使用监督学习,因此将会给网络提供一组具有已知分类的训练模式,以及输出类别的一个初始分布。在完成训练过程后,LVQ 将通过将输入向量分配到与输出单元相同的类别,对该向量进行分类。

Learning Vector Quantization (LVQ), different from Vector quantization (VQ) and Kohonen Self-Organizing Maps (KSOM), basically is a competitive network which uses supervised learning. We may define it as a process of classifying the patterns where each output unit represents a class. As it uses supervised learning, the network will be given a set of training patterns with known classification along with an initial distribution of the output class. After completing the training process, LVQ will classify an input vector by assigning it to the same class as that of the output unit.

Architecture

下图展示了 LVQ 的架构,该架构与 KSOM 的架构相当相似。正如我们所见,有 “n” 个输入单元和 “m” 个输出单元。这些层完全互联,并在其上设置权重。

Following figure shows the architecture of LVQ which is quite similar to the architecture of KSOM. As we can see, there are “n” number of input units and “m” number of output units. The layers are fully interconnected with having weights on them.

layers

Parameters Used

以下是 LVQ 训练过程中以及流程图中使用的参数:

Following are the parameters used in LVQ training process as well as in the flowchart

  1. x = training vector (x1,…​,xi,…​,xn)

  2. T = class for training vector x

  3. wj = weight vector for jth output unit

  4. Cj = class associated with the jth output unit

Training Algorithm

Step 1 - 初始化参考向量,如下操作:

Step 1 − Initialize reference vectors, which can be done as follows −

  1. Step 1(a) − From the given set of training vectors, take the first “m” (number of clusters) training vectors and use them as weight vectors. The remaining vectors can be used for training.

  2. Step 1(b) − Assign the initial weight and classification randomly.

  3. Step 1(c) − Apply K-means clustering method.

Step 2 - 初始化参考向量α

Step 2 − Initialize reference vector $\alpha$

Step 3 - 如果未达到该算法的停止条件,则继续步骤 4-9。

Step 3 − Continue with steps 4-9, if the condition for stopping this algorithm is not met.

Step 4 - 对每个训练输入向量 x 按照步骤 5-6 操作。

Step 4 − Follow steps 5-6 for every training input vector x.

Step 5 - 计算 j = 1 to mi = 1 to n 的欧氏距离平方

Step 5 − Calculate Square of Euclidean Distance for j = 1 to m and i = 1 to n

D(j)\:=\:\displaystyle\sum\limits_{i=1} n\displaystyle\sum\limits_{j=1} m (x_{i}\:-\:w_{ij})^2

D(j)\:=\:\displaystyle\sum\limits_{i=1}n\displaystyle\sum\limits_{j=1}m (x_{i}\:-\:w_{ij})^2

Step 6 - 获得 D(j) 最小的获胜单元 J

Step 6 − Obtain the winning unit J where D(j) is minimum.

Step 7 - 通过以下关系计算获胜单元的新权重:

Step 7 − Calculate the new weight of the winning unit by the following relation −

如果 T = Cj 则 $w_{j}(new)\:=\:w_{j}(old)\:+\:\alpha[x\:-\:w_{j}(old)]$

if T = Cj then $w_{j}(new)\:=\:w_{j}(old)\:+\:\alpha[x\:-\:w_{j}(old)]$

如果 T ≠ Cj 则 $w_{j}(new)\:=\:w_{j}(old)\:-\:\alpha[x\:-\:w_{j}(old)]$

if T ≠ Cj then $w_{j}(new)\:=\:w_{j}(old)\:-\:\alpha[x\:-\:w_{j}(old)]$

Step 8 - 降低学习率 α。

Step 8 − Reduce the learning rate $\alpha$.

Step 9 - 对停止条件进行测试。可能如下所示:

Step 9 − Test for the stopping condition. It may be as follows −

  1. Maximum number of epochs reached.

  2. Learning rate reduced to a negligible value.

Flowchart

flowchart

Variants

科霍宁开发了另外三种变体,即 LVQ2、LVQ2.1 和 LVQ3。在这三种变体中,由于获胜单元和亚军单元都会学习,因此复杂性比 LVQ 更高。

Three other variants namely LVQ2, LVQ2.1 and LVQ3 have been developed by Kohonen. Complexity in all these three variants, due to the concept that the winner as well as the runner-up unit will learn, is more than in LVQ.

LVQ2

如上所述,LVQ 其他变体的概念,LVQ2 的条件由窗口形成。此窗口将基于以下参数:

As discussed, the concept of other variants of LVQ above, the condition of LVQ2 is formed by window. This window will be based on the following parameters −

  1. x − the current input vector

  2. yc − the reference vector closest to x

  3. yr − the other reference vector, which is next closest to x

  4. dc − the distance from x to yc

  5. dr − the distance from x to yr

输入向量 x 落入窗口中,则

The input vector x falls in the window, if

\frac{d_{c}}{d_{r}}\:>\:1\:-\:\theta\:\:and\:\:\frac{d_{r}}{d_{c}}\:>\:1\:+\:\theta

此处,$\theta$ 是训练样本的数量。

Here, $\theta$ is the number of training samples.

使用以下公式可以进行更新:

Updating can be done with the following formula −

$y_{c}(t\:+\:1)\:=\:y_{c}(t)\:+\:\alpha(t)[x(t)\:-\:y_{c}(t)]$ (belongs to different class)

$y_{c}(t\:+\:1)\:=\:y_{c}(t)\:+\:\alpha(t)[x(t)\:-\:y_{c}(t)]$ (belongs to different class)

$y_{r}(t\:+\:1)\:=\:y_{r}(t)\:+\:\alpha(t)[x(t)\:-\:y_{r}(t)]$ (belongs to same class)

$y_{r}(t\:+\:1)\:=\:y_{r}(t)\:+\:\alpha(t)[x(t)\:-\:y_{r}(t)]$ (belongs to same class)

这里 $\alpha$ 是学习率。

Here $\alpha$ is the learning rate.

LVQ2.1

在 LVQ2.1 中,我们将取两个最接近的向量 yc1yc2 ,窗口的条件如下:

In LVQ2.1, we will take the two closest vectors namely yc1 and yc2 and the condition for window is as follows −

Min\begin{bmatrix}\frac{d_{c1}}{d_{c2}},\frac{d_{c2}}{d_{c1}}\end{bmatrix}\:>\:(1\:-\:\theta)

Max\begin{bmatrix}\frac{d_{c1}}{d_{c2}},\frac{d_{c2}}{d_{c1}}\end{bmatrix}\:<\:(1\:+\:\theta)

使用以下公式可以进行更新:

Updating can be done with the following formula −

$y_{c1}(t\:+\:1)\:=\:y_{c1}(t)\:+\:\alpha(t)[x(t)\:-\:y_{c1}(t)]$ (belongs to different class)

$y_{c1}(t\:+\:1)\:=\:y_{c1}(t)\:+\:\alpha(t)[x(t)\:-\:y_{c1}(t)]$ (belongs to different class)

$y_{c2}(t\:+\:1)\:=\:y_{c2}(t)\:+\:\alpha(t)[x(t)\:-\:y_{c2}(t)]$ (belongs to same class)

$y_{c2}(t\:+\:1)\:=\:y_{c2}(t)\:+\:\alpha(t)[x(t)\:-\:y_{c2}(t)]$ (belongs to same class)

这里,$\alpha$ 是学习速率。

Here, $\alpha$ is the learning rate.

LVQ3

在 LVQ3 中,我们将采用两个最接近的向量,即 yc1yc2 ,窗口的条件如下 −

In LVQ3, we will take the two closest vectors namely yc1 and yc2 and the condition for window is as follows −

Min\begin{bmatrix}\frac{d_{c1}}{d_{c2}},\frac{d_{c2}}{d_{c1}}\end{bmatrix}\:>\:(1\:-\:\theta)(1\:+\:\theta)

这里 $\theta\approx 0.2$

Here $\theta\approx 0.2$

使用以下公式可以进行更新:

Updating can be done with the following formula −

$y_{c1}(t\:+\:1)\:=\:y_{c1}(t)\:+\:\beta(t)[x(t)\:-\:y_{c1}(t)]$ (belongs to different class)

$y_{c1}(t\:+\:1)\:=\:y_{c1}(t)\:+\:\beta(t)[x(t)\:-\:y_{c1}(t)]$ (belongs to different class)

$y_{c2}(t\:+\:1)\:=\:y_{c2}(t)\:+\:\beta(t)[x(t)\:-\:y_{c2}(t)]$ (belongs to same class)

$y_{c2}(t\:+\:1)\:=\:y_{c2}(t)\:+\:\beta(t)[x(t)\:-\:y_{c2}(t)]$ (belongs to same class)

这里 $\beta$ 是学习速率 $\alpha$ 的倍数,对于每个 0.1 < m < 0.5

Here $\beta$ is the multiple of the learning rate $\alpha$ and $\beta\:=\:m \alpha(t)$ for every 0.1 < m < 0.5

Adaptive Resonance Theory

此网络由 Stephen Grossberg 和 Gail Carpenter 于 1987 年开发。它基于竞争,并使用无监督学习模型。自适应共振理论 (ART) 网络,顾名思义,始终对新事物保持开放(适应)而不会丢失旧模式(共振)。基本上,ART 网络是一个向量分类器,它接受一个输入向量并根据它与哪个存储模式最相似将其分类到某个类别中。

This network was developed by Stephen Grossberg and Gail Carpenter in 1987. It is based on competition and uses unsupervised learning model. Adaptive Resonance Theory (ART) networks, as the name suggests, is always open to new learning (adaptive) without losing the old patterns (resonance). Basically, ART network is a vector classifier which accepts an input vector and classifies it into one of the categories depending upon which of the stored pattern it resembles the most.

Operating Principal

ART 分类的主要操作可分为以下阶段 −

The main operation of ART classification can be divided into the following phases −

  1. Recognition phase − The input vector is compared with the classification presented at every node in the output layer. The output of the neuron becomes “1” if it best matches with the classification applied, otherwise it becomes “0”.

  2. Comparison phase − In this phase, a comparison of the input vector to the comparison layer vector is done. The condition for reset is that the degree of similarity would be less than vigilance parameter.

  3. Search phase − In this phase, the network will search for reset as well as the match done in the above phases. Hence, if there would be no reset and the match is quite good, then the classification is over. Otherwise, the process would be repeated and the other stored pattern must be sent to find the correct match.

ART1

它是一种 ART,被设计为对二进制向量进行聚类。我们可以通过其架构来理解这一点。

It is a type of ART, which is designed to cluster binary vectors. We can understand about this with the architecture of it.

Architecture of ART1

它由以下两个单元组成 −

It consists of the following two units −

Computational Unit − 它由以下内容组成 −

Computational Unit − It is made up of the following −

  1. Input unit (F1 layer) − It further has the following two portions − F1(a) layer (Input portion) − In ART1, there would be no processing in this portion rather than having the input vectors only. It is connected to F1(b) layer (interface portion). F1(b) layer (Interface portion) − This portion combines the signal from the input portion with that of F2 layer. F1(b) layer is connected to F2 layer through bottom up weights bij and F2 layer is connected to F1(b) layer through top down weights tji.

  2. Cluster Unit (F2 layer) − This is a competitive layer. The unit having the largest net input is selected to learn the input pattern. The activation of all other cluster unit are set to 0.

  3. Reset Mechanism − The work of this mechanism is based upon the similarity between the top-down weight and the input vector. Now, if the degree of this similarity is less than the vigilance parameter, then the cluster is not allowed to learn the pattern and a rest would happen.

Supplement Unit − 重置机制的问题实际上在于,第 F2 层必须在某些条件下受到抑制,并且在发生某些学习时也必须可用。这就是加 gain control units 两个补充单元即 G1G2 以及重置单元 R 的原因。这些单元接收并向网络中存在的其他单元发送信号。 ‘+’ 指示激励信号,而 ‘−’ 指示抑制信号。

Supplement Unit − Actually the issue with Reset mechanism is that the layer F2 must have to be inhibited under certain conditions and must also be available when some learning happens. That is why two supplemental units namely, G1 and G2 is added along with reset unit, R. They are called gain control units. These units receive and send signals to the other units present in the network. ‘+’ indicates an excitatory signal, while ‘−’ indicates an inhibitory signal.

supplement unit
units

Parameters Used

以下参数使用 −

Following parameters are used −

  1. n − Number of components in the input vector

  2. m − Maximum number of clusters that can be formed

  3. bij − Weight from F1(b) to F2 layer, i.e. bottom-up weights

  4. tji − Weight from F2 to F1(b) layer, i.e. top-down weights

  5. ρ − Vigilance parameter

  6. ||x|| − Norm of vector x

Algorithm

Step 1 − 初始化学习率、警觉性参数和权重如下 −

Step 1 − Initialize the learning rate, the vigilance parameter, and the weights as follows −

\alpha\:>\:1\:\:and\:\:0\:<\rho\:\leq\:1

0\:<\:b_{ij}(0)\:<\:\frac{\alpha}{\alpha\:-\:1\:+\:n}\:\:and\:\:t_{ij}(0)\:=\:1

Step 2 − 当停止条件为假时,继续执行步骤 3-9。

Step 2 − Continue step 3-9, when the stopping condition is not true.

Step 3 − 为每个训练输入继续执行步骤 4-6。

Step 3 − Continue step 4-6 for every training input.

Step 4 − 将所有 F1(a) 和 F1 单元的激活设置如下

Step 4 − Set activations of all F1(a) and F1 units as follows

F2 = 0 and F1(a) = input vectors

F2 = 0 and F1(a) = input vectors

Step 5 − 从 F1(a) 到 F1(b) 层的输入信号必须像

Step 5 − Input signal from F1(a) to F1(b) layer must be sent like

s_{i}\:=\:x_{i}

Step 6 − 对于每个抑制的 F2 节点

Step 6 − For every inhibited F2 node

$y_{j}\:=\:\sum_ib_{ij}x_{i}$ 条件为 yj ≠ -1

$y_{j}\:=\:\sum_i b_{ij}x_{i}$ the condition is yj ≠ -1

Step 7 − 当重置为真时,执行步骤 8-10。

Step 7 − Perform step 8-10, when the reset is true.

Step 8 - 找在所有节点 jyJ ≥ yjJ

Step 8 − Find J for yJ ≥ yj for all nodes j

Step 9 - 再按如下方式计算F1(b)上的激活

Step 9 − Again calculate the activation on F1(b) as follows

x_{i}\:=\:sitJi

Step 10 - 在计算出矢量 x 的范数和矢量 s 的范数后,需要按如下方式检查重置条件 -

Step 10 − Now, after calculating the norm of vector x and vector s, we need to check the reset condition as follows −

如果 ||x||/ ||s|| < 警觉参数 ρ ,则抑制节点 J 并转到步骤7

If ||x||/ ||s|| < vigilance parameter ρ,⁡then⁡inhibit ⁡node J and go to step 7

否则,如果 ||x||/ ||s|| ≥ 警觉参数 ρ ,则继续。

Else If ||x||/ ||s|| ≥ vigilance parameter ρ, then proceed further.

Step 11 - 节点 J 的权重更新可按如下方式进行 -

Step 11 − Weight updating for node J can be done as follows −

b_{ij}(new)\:=\:\frac{\alpha x_{i}}{\alpha\:-\:1\:+\:||x||}

t_{ij}(new)\:=\:x_{i}

Step 12 - 必须检查算法的停止条件,它可能如下 -

Step 12 − The stopping condition for algorithm must be checked and it may be as follows −

  1. Do not have any change in weight.

  2. Reset is not performed for units.

  3. Maximum number of epochs reached.

Kohonen Self-Organizing Feature Maps

假设我们有一些任意维度的模式,但是我们需要一维或二维的模式。然后,特征映射的过程对于将宽模式空间转换到典型特征空间非常有用。现在,问题出现了为什么需要自组织特征图?原因在于,除了将任意维度转换为 1-D 或 2-D 的能力外,它还必须具有保持邻域拓扑的能力。

Suppose we have some pattern of arbitrary dimensions, however, we need them in one dimension or two dimensions. Then the process of feature mapping would be very useful to convert the wide pattern space into a typical feature space. Now, the question arises why do we require self-organizing feature map? The reason is, along with the capability to convert the arbitrary dimensions into 1-D or 2-D, it must also have the ability to preserve the neighbor topology.

Neighbor Topologies in Kohonen SOM

可以有各种拓扑,但以下两种拓扑使用最多 -

There can be various topologies, however the following two topologies are used the most −

Rectangular Grid Topology

此拓扑在距离 2 格中有 24 个节点,在距离 1 格中有 16 个节点,在距离 0 格中有 8 个节点,这意味着每个矩形格之间的差为 8 个节点。胜利单元用 # 表示。

This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1 grid, and 8 nodes in the distance-0 grid, which means the difference between each rectangular grid is 8 nodes. The winning unit is indicated by #.

rectangular

Hexagonal Grid Topology

此拓扑在距离 2 格中有 18 个节点,在距离 1 格中有 12 个节点,在距离 0 格中有 6 个节点,这意味着每个矩形格之间的差为 6 个节点。胜利单元用 # 表示。

This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and 6 nodes in the distance-0 grid, which means the difference between each rectangular grid is 6 nodes. The winning unit is indicated by #.

hexagonal

Architecture

KSOM 的体系结构类似于竞争网络的体系结构。在前面讨论的邻域方案的帮助下,可以在网络的扩展区域上进行训练。

The architecture of KSOM is similar to that of the competitive network. With the help of neighborhood schemes, discussed earlier, the training can take place over the extended region of the network.

ksom

Algorithm for training

Step 1 - 初始化权重、学习率 α 和邻域拓扑方案。

Step 1 − Initialize the weights, the learning rate α and the neighborhood topological scheme.

Step 2 − 当停止条件为假时,继续执行步骤 3-9。

Step 2 − Continue step 3-9, when the stopping condition is not true.

Step 3 - 对每个输入矢量 x 继续步骤 4-6。

Step 3 − Continue step 4-6 for every input vector x.

Step 4 − 计算 j = 1 to m 的欧几里得距离的平方

Step 4 − Calculate Square of Euclidean Distance for j = 1 to m

D(j)\:=\:\displaystyle\sum\limits_{i=1}^n \displaystyle\sum\limits_{j=1}^m (x_{i}\:-\:w_{ij})^2

Step 5 − 获取获胜元 J ,其中 D(j) 为最小值。

Step 5 − Obtain the winning unit J where D(j) is minimum.

Step 6 − 利用以下关系计算获胜单元的新权重 −

Step 6 − Calculate the new weight of the winning unit by the following relation −

w_{ij}(new)\:=\:w_{ij}(old)\:+\:\alpha[x_{i}\:-\:w_{ij}(old)]

Step 7 − 利用以下关系更新学习率 α

Step 7 − Update the learning rate α by the following relation −

\alpha(t\:+\:1)\:=\:0.5\alpha t

Step 8 − 减小拓扑模式的半径。

Step 8 − Reduce the radius of topological scheme.

Step 9 − 检查网络的停止条件。

Step 9 − Check for the stopping condition for the network.

Associate Memory Network

以下是我们可以观察到的两种类型的联想式内存 −

These kinds of neural networks work on the basis of pattern association, which means they can store different patterns and at the time of giving an output they can produce one of the stored patterns by matching them with the given input pattern. These types of memories are also called Content-Addressable Memory (CAM). Associative memory makes a parallel search with the stored patterns as data files.

这是一个单层神经网络,其中输入训练向量和输出目标向量是相同的。权重被确定下来,以便网络存储一组模式。

Following are the two types of associative memories we can observe −

  1. Auto Associative Memory

  2. Hetero Associative memory

Auto Associative Memory

如下图所示,自动联想式内存网络的结构具有 ‘n’ 个输入训练向量和类似的 ‘n’ 个输出目标向量。

This is a single layer neural network in which the input training vector and the output target vectors are the same. The weights are determined so that the network stores a set of patterns.

Architecture

为了进行训练,该网络正在使用希布尔或德尔塔学习规则。

As shown in the following figure, the architecture of Auto Associative memory network has ‘n’ number of input training vectors and similar ‘n’ number of output target vectors.

auto associative memory

Training Algorithm

Step 1 − 将所有权重初始化为零,如 wij = 0 (i = 1 to n, j = 1 to n)

For training, this network is using the Hebb or Delta learning rule.

Step 2 − 对每个输入向量执行步骤 3-4。

Step 1 − Initialize all the weights to zero as wij = 0 (i = 1 to n, j = 1 to n)

Step 3 − 激活每个输入单元,如下所示 −

Step 2 − Perform steps 3-4 for each input vector.

x_{i}\:=\:a_{i}

Step 3 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 4 − 激活每个输出单元,如下所示 −

Step 4 − Activate each output unit as follows −

y_{j}\:=\:s_{j}\:(j\:=\:1\:to\:n)

Step 5 − 调整权重,如下所示 −

Step 5 − Adjust the weights as follows −

w_{ij}(new)\:=\:w_{ij}(old)\:+\:x_{i}y_{j}

Testing Algorithm

Step 1 − 为希布尔规则设置在训练期间获得的权重。

Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − 对每个输入向量执行步骤 3-5。

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − 将输入单元的激活设置为等于输入向量的激活。

Step 3 − Set the activation of the input units equal to that of the input vector.

Step 4 - 为每个输出单元计算净输入 j = 1 to n

Step 4 − Calculate the net input to each output unit j = 1 to n

y_{inj}\:=\:\displaystyle\sum\limits_{i=1}^n x_{i}w_{ij}

Step 5 - 应用以下激活函数来计算输出

Step 5 − Apply the following activation function to calculate the output

y_{j}\:=\:f(y_{inj})\:=\:\begin{cases}+1 & if\:y_{inj}\:>\:0\\-1 & if\:y_{inj}\:\leqslant\:0\end{cases}

Hetero Associative memory

类似于自动关联记忆网络,这也是一个单层神经网络。然而,在这个网络中,输入训练向量和输出目标向量并不相同。权重被确定下来,以使网络存储一组模式。异质关联网络本质上是静态的,因此不会有非线性和延迟操作。

Similar to Auto Associative Memory network, this is also a single layer neural network. However, in this network the input training vector and the output target vectors are not the same. The weights are determined so that the network stores a set of patterns. Hetero associative network is static in nature, hence, there would be no non-linear and delay operations.

Architecture

如下图所示,异质关联存储网络的架构具有 ‘n’ 个输入训练向量和 ‘m’ 个输出目标向量。

As shown in the following figure, the architecture of Hetero Associative Memory network has ‘n’ number of input training vectors and ‘m’ number of output target vectors.

hetero associative memory

Training Algorithm

Step 1 − 将所有权重初始化为零,如 wij = 0 (i = 1 to n, j = 1 to n)

For training, this network is using the Hebb or Delta learning rule.

Step 1 - 将所有权重初始化为零,即 wij = 0 (i = 1 to n, j = 1 to m)

Step 1 − Initialize all the weights to zero as wij = 0 (i = 1 to n, j = 1 to m)

Step 3 − 激活每个输入单元,如下所示 −

Step 2 − Perform steps 3-4 for each input vector.

x_{i}\:=\:a_{i}

Step 3 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 4 − 激活每个输出单元,如下所示 −

Step 4 − Activate each output unit as follows −

y_{j}\:=\:s_{j}\:(j\:=\:1\:to\:m)

Step 5 − 调整权重,如下所示 −

Step 5 − Adjust the weights as follows −

w_{ij}(new)\:=\:w_{ij}(old)\:+\:x_{i}y_{j}

Testing Algorithm

Step 1 − 为希布尔规则设置在训练期间获得的权重。

Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − 对每个输入向量执行步骤 3-5。

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − 将输入单元的激活设置为等于输入向量的激活。

Step 3 − Set the activation of the input units equal to that of the input vector.

Step 4 - 为每个输出单元计算净输入 j = 1 to m;

Step 4 − Calculate the net input to each output unit j = 1 to m;

y_{inj}\:=\:\displaystyle\sum\limits_{i=1}^n x_{i}w_{ij}

Step 5 - 应用以下激活函数来计算输出

Step 5 − Apply the following activation function to calculate the output

y_{j}\:=\:f(y_{inj})\:=\:\begin{cases}+1 & if\:y_{inj}\:>\:0\\0 & if\:y_{inj}\:=\:0\\-1 & if\:y_{inj}\:<\:0\end{cases}

Artificial Neural Network - Hopfield Networks

霍普菲尔德神经网络由约翰·J·霍普菲尔德博士于 1982 年发明。它包括一个层,其包含一个或多个完全连接的递归神经元。霍普菲尔德网络通常用于自动关联和优化任务。

Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single layer which contains one or more fully connected recurrent neurons. The Hopfield network is commonly used for auto-association and optimization tasks.

Discrete Hopfield Network

以离散线方式运行的霍普菲尔德网络,或者换句话说,可以认为输入和输出模式是离散矢量,它们在性质上可以是二进制(0,1)或双极(+1,-1)。该网络具有对称的权重,没有自连接,即 wij = wjiwii = 0

A Hopfield network which operates in a discrete line fashion or in other words, it can be said the input and output patterns are discrete vector, which can be either binary (0,1) or bipolar (+1, -1) in nature. The network has symmetrical weights with no self-connections i.e., wij = wji and wii = 0.

Architecture

以下是一些需要牢记关于离散霍普菲尔德网络的重要要点 −

Following are some important points to keep in mind about discrete Hopfield network −

  1. This model consists of neurons with one inverting and one non-inverting output.

  2. The output of each neuron should be the input of other neurons but not the input of self.

  3. Weight/connection strength is represented by wij.

  4. Connections can be excitatory as well as inhibitory. It would be excitatory, if the output of the neuron is same as the input, otherwise inhibitory.

  5. Weights should be symmetrical, i.e. wij = wji

hopfield

Y1 传输到 Y2YiYn 的输出分别具有权重 w12w1iw1n 。同样,其他弧线上也有权重。

The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively. Similarly, other arcs have the weights on them.

Training Algorithm

在离散霍普菲尔德网络训练期间,将更新权重。众所周知,我们可以有二进制输入矢量以及双极输入矢量。因此,在这两种情况下,都可以利用以下关系更新权重

During training of discrete Hopfield network, weights will be updated. As we know that we can have the binary input vectors as well as bipolar input vectors. Hence, in both the cases, weight updates can be done with the following relation

Case 1 − 二进制输入模式

Case 1 − Binary input patterns

对于一组二进制模式 s(p), p = 1 to P

For a set of binary patterns s(p), p = 1 to P

在此, s(p) = s1(p), s2(p),…​, si(p),…​, sn(p)

Here, s(p) = s1(p), s2(p),…​, si(p),…​, sn(p)

权重矩阵由以下公式给出:

Weight Matrix is given by

w_{ij}\:=\:\sum_{p=1}^P[2s_{i}(p)-\:1][2s_{j}(p)-\:1]\:\:\:\:\:对于 i \:\neq\: j

w_{ij}\:=\:\sum_{p=1}^P[2s_{i}(p)-\:1][2s_{j}(p)-\:1]\:\:\:\:\:for\:i\:\neq\:j

Case 2 − 双极性输入模式

Case 2 − Bipolar input patterns

对于一组二进制模式 s(p), p = 1 to P

For a set of binary patterns s(p), p = 1 to P

在此, s(p) = s1(p), s2(p),…​, si(p),…​, sn(p)

Here, s(p) = s1(p), s2(p),…​, si(p),…​, sn(p)

权重矩阵由以下公式给出:

Weight Matrix is given by

w_{ij}\:=\:\sum_{p=1}^P[s_{i}(p)][s_{j}(p)]\:\:\:\:\:对于 i \:\neq\: j

w_{ij}\:=\:\sum_{p=1}^P[s_{i}(p)][s_{j}(p)]\:\:\:\:\:for\:i\:\neq\:j

Testing Algorithm

Step 1 − 使用赫布原理从训练算法获得的权重进行初始化。

Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian principle.

Step 2 − 如果网络的激活未合并,则执行步骤 3-9。

Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.

Step 3 − 对于每个输入向量 X ,执行步骤 4-8。

Step 3 − For each input vector X, perform steps 4-8.

Step 4 − 按如下方式使网络的初始激活等于外部输入向量 X

Step 4 − Make initial activation of the network equal to the external input vector X as follows −

y_{i}\:=\:x_{i}\:\:\对于 i\:=\:1\:至\:n

y_{i}\:=\:x_{i}\:\:\:for\:i\:=\:1\:to\:n

Step 5 − 对于每个单位 Yi ,执行步骤 6-9。

Step 5 − For each unit Yi, perform steps 6-9.

Step 6 − 按如下方式计算网络的净输入 −

Step 6 − Calculate the net input of the network as follows −

y_{ini}\:=\:x_{i}\:+\:\displaystyle\sum\limits_{j}y_{j}w_{ji}

Step 7 − 在净输入上应用激活计算输出 −

Step 7 − Apply the activation as follows over the net input to calculate the output −

y_{i}\:=\begin{cases}1 & 如果\:y_{ini}\:>\:\theta_{i}\\y_{i} & 如果\:y_{ini}\:=\:\theta_{i}\\0 & 如果\:y_{ini}\:<\:\theta_{i}\end{cases}

y_{i}\:=\begin{cases}1 & if\:y_{ini}\:>\:\theta_{i}\\y_{i} & if\:y_{ini}\:=\:\theta_{i}\\0 & if\:y_{ini}\:<\:\theta_{i}\end{cases}

此处,$\theta_{i}$ 为阈值。

Here $\theta_{i}$ is the threshold.

Step 8 − 将此输出 yi 广播至所有其他单位。

Step 8 − Broadcast this output yi to all other units.

Step 9 − 测试连接网络。

Step 9 − Test the network for conjunction.

Energy Function Evaluation

能量函数被定义为系统状态的绑定且非递增函数。

An energy function is defined as a function that is bonded and non-increasing function of the state of the system.

能量函数 Ef ⁡,也称 Lyapunov function 决定离散霍普菲尔德网络的稳定性,其特征如下 −

Energy function Ef⁡, ⁡also called Lyapunov function determines the stability of discrete Hopfield network, and is characterized as follows −

E_{f}\ := \:-\frac{1}{2}\displaystyle\sum\limits_{i=1} n\displaystyle\sum\limits_{j=1} n y_{i}y_{j}w_{ij}\:-\:\displaystyle\sum\limits_{i=1}^n x_{i}y_{i}\:+\:\displaystyle\sum\limits_{i=1}^n \theta_{i}y_{i}

E_{f}\:=\:-\frac{1}{2}\displaystyle\sum\limits_{i=1}n\displaystyle\sum\limits_{j=1}n y_{i}y_{j}w_{ij}\:-\:\displaystyle\sum\limits_{i=1}^n x_{i}y_{i}\:+\:\displaystyle\sum\limits_{i=1}^n \theta_{i}y_{i}

Condition − 在稳定网络中,每当节点状态发生变化,上述能量函数会减少。

Condition − In a stable network, whenever the state of node changes, the above energy function will decrease.

假设 i 节点状态从 $y_i^{(k)}$ 更改为 $y_i^{(k\:+\:1)}$⁡,那么通过以下关系给出了能量变化 $\Delta E_{f}$

Suppose when node i has changed state from $y_i^{(k)}$ to $y_i^{(k\:+\:1)}$ ⁡then the Energy change $\Delta E_{f}$ is given by the following relation

\Delta E_{f}\ := \:E_{f}(y_i {(k+1)})\:-\:E_{f}(y_i {(k)})

\Delta E_{f}\:=\:E_{f}(y_i{(k+1)})\:-\:E_{f}(y_i{(k)})

=\:-\left(\begin{array}{c}\displaystyle\sum\limits_{j=1}^n w_{ij}y_i {(k)}\:+\:x_{i}\:-\:\theta_{i}\end{array}\right)(y_i {(k+1)}\:-\:y_i^{(k)})

=\:-\left(\begin{array}{c}\displaystyle\sum\limits_{j=1}^n w_{ij}y_i{(k)}\:+\:x_{i}\:-\:\theta_{i}\end{array}\right)(y_i{(k+1)}\:-\:y_i^{(k)})

=\:-\:(net_{i})\Delta y_{i}

此处 $\Delta y_{i}\ := \:y_i {(k\:+\:1)}\:-\:y_i {(k)}$

Here $\Delta y_{i}\:=\:y_i{(k\:+\:1)}\:-\:y_i{(k)}$

能量变化取决于这样一个事实,即每次只有一个单元能更新其激活。

The change in energy depends on the fact that only one unit can update its activation at a time.

Continuous Hopfield Network

与离散霍普菲尔德网络相比,连续网络的时间是一个连续变量。它还用于自关联和优化问题,如旅行商问题。

In comparison with Discrete Hopfield network, continuous network has time as a continuous variable. It is also used in auto association and optimization problems such as travelling salesman problem.

Model − 该模型或架构可以通过添加电气组件(如放大器)构建,放大器可将输入电压映射到输出电压的 sigmoid 激活函数上。

Model − The model or architecture can be build up by adding electrical components such as amplifiers which can map the input voltage to the output voltage over a sigmoid activation function.

Energy Function Evaluation

E_f = \frac{1}{2}\displaystyle\sum\limits_{i=1}^n\sum_{\substack{j = 1\\ j \ne i}}^n y_i y_j w_{ij} - \displaystyle\sum\limits_{i=1}^n x_i y_i + \frac{1}{\lambda} \displaystyle\sum\limits_{i=1}^n \sum_{\substack{j = 1\\ j \ne i}}^n w_{ij} g_{ri} \int_{0}^{y_i} a^{-1}(y) dy

此处 λ 是增益参数, gri 输入电导。

Here λ is gain parameter and gri input conductance.

Boltzmann Machine

这些是具有递归结构的随机学习过程,是 ANN 中早期优化技术的基石。Boltzmann 机是由 Geoffrey Hinton 和 Terry Sejnowski 于 1985 年发明的。在 Hinton 对 Boltzmann 机的解读中可以看到更加清晰的解释。

These are stochastic learning processes having recurrent structure and are the basis of the early optimization techniques used in ANN. Boltzmann Machine was invented by Geoffrey Hinton and Terry Sejnowski in 1985. More clarity can be observed in the words of Hinton on Boltzmann Machine.

“此网络一个令人惊讶的特性是它仅使用局部可用的信息。权重的变化仅取决于它连接的两个单​​元的行为,即使该变化优化了全局度量值” - Ackley,Hinton 1985 年。

“A surprising feature of this network is that it uses only locally available information. The change of weight depends only on the behavior of the two units it connects, even though the change optimizes a global measure” - Ackley, Hinton 1985.

Boltzmann 机的一些重要要点 −

Some important points about Boltzmann Machine −

  1. They use recurrent structure.

  2. They consist of stochastic neurons, which have one of the two possible states, either 1 or 0.

  3. Some of the neurons in this are adaptive (free state) and some are clamped (frozen state).

  4. If we apply simulated annealing on discrete Hopfield network, then it would become Boltzmann Machine.

Objective of Boltzmann Machine

玻尔兹曼机的主要目的是优化问题的解决方案。玻尔兹曼机的任务就是优化与该特定问题相关的权重和数量。

The main purpose of Boltzmann Machine is to optimize the solution of a problem. It is the work of Boltzmann Machine to optimize the weights and quantity related to that particular problem.

Architecture

下图显示了玻尔兹曼机的结构。从图中可以清楚地看到,它是一个二维的单元阵列。这里,单位之间互联的权重为 –p ,其中 p > 0 。自连接的权重由 b 给出,其中 b > 0

The following diagram shows the architecture of Boltzmann machine. It is clear from the diagram, that it is a two-dimensional array of units. Here, weights on interconnections between units are –p where p > 0. The weights of self-connections are given by b where b > 0.

boltzmann

Training Algorithm

众所周知,玻尔兹曼机具有固定权重,因此不会有训练算法,因为我们不需要更新网络中的权重。但是,为了测试网络,我们必须设置权重以及找到一致函数 (CF)。

As we know that Boltzmann machines have fixed weights, hence there will be no training algorithm as we do not need to update the weights in the network. However, to test the network we have to set the weights as well as to find the consensus function (CF).

玻尔兹曼机具有一组单元 UiUj ,并且在其上具有双向连接。

Boltzmann machine has a set of units Ui and Uj and has bi-directional connections on them.

  1. We are considering the fixed weight say wij.

  2. wij ≠ 0 if Ui and Uj are connected.

  3. There also exists a symmetry in weighted interconnection, i.e. wij = wji.

  4. wii also exists, i.e. there would be the self-connection between units.

  5. For any unit Ui, its state ui would be either 1 or 0.

玻尔兹曼机的主要目标是最大化一致函数 (CF),其可以用以下关系给出

The main objective of Boltzmann Machine is to maximize the Consensus Function (CF) which can be given by the following relation

CF\:=\:\displaystyle\sum\limits_{i} \displaystyle\sum\limits_{j\leqslant i} w_{ij}u_{i}u_{j}

现在,当状态从 1 变为 0 或从 0 变为 1 时,一致性的变化可以用以下关系给出 −

Now, when the state changes from either 1 to 0 or from 0 to 1, then the change in consensus can be given by the following relation −

\Delta CF\:=\:(1\:-\:2u_{i})(w_{ij}\:+\:\displaystyle\sum\limits_{j\neq i} u_{i} w_{ij})

这里 uiUi 的当前状态。

Here ui is the current state of Ui.

系数 ( 1 - 2ui ) 的变化由以下关系给出 −

The variation in coefficient (1 - 2ui) is given by the following relation −

(1\:-\:2u_{i})\:=\:\begin{cases}+1, & U_{i}\:is\:currently\:off\\-1, & U_{i}\:is\:currently\:on\end{cases}

通常,单位 Ui 不会改变其状态,但如果改变,则信息将驻留在该单位的本地。通过这种改变,网络的一致性也会增加。

Generally, unit Ui does not change its state, but if it does then the information would be residing local to the unit. With that change, there would also be an increase in the consensus of the network.

网络接受单位状态变化的概率由以下关系给出 −

Probability of the network to accept the change in the state of the unit is given by the following relation −

AF(i,T)\:=\:\frac{1}{1\:+\:exp[-\frac{\Delta CF(i)}{T}]}

在此, T 是控制参数。当 CF 达到最高值时,它将减少。

Here, T is the controlling parameter. It will decrease as CF reaches the maximum value.

Testing Algorithm

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights representing the constraint of the problem

  2. Control Parameter T

Step 2 − 在停止条件不为真时,继续步骤 3-8。

Step 2 − Continue steps 3-8, when the stopping condition is not true.

Step 3 − 执行步骤 4-7。

Step 3 − Perform steps 4-7.

Step 4 − 假设某一状态已更改权重,并选择整数 I, J 作为 1n 之间的随机值。

Step 4 − Assume that one of the state has changed the weight and choose the integer I, J as random values between 1 and n.

Step 5 − 如下计算共识度变化 −

Step 5 − Calculate the change in consensus as follows −

\Delta CF\:=\:(1\:-\:2u_{i})(w_{ij}\:+\:\displaystyle\sum\limits_{j\neq i} u_{i} w_{ij})

Step 6 − 计算此网络接受状态变化的概率

Step 6 − Calculate the probability that this network would accept the change in state

AF(i,T)\:=\:\frac{1}{1\:+\:exp[-\frac{\Delta CF(i)}{T}]}

Step 7 − 如下接受或拒绝此更改 −

Step 7 − Accept or reject this change as follows −

Case I − 如果 R < AF ,请接受更改。

Case I − if R < AF, accept the change.

Case II − 如果 R ≥ AF ,请拒绝更改。

Case II − if R ≥ AF, reject the change.

在此, R 是 0 到 1 之間の随机数。

Here, R is the random number between 0 and 1.

Step 8 − 如下减少控制参数(温度) −

Step 8 − Reduce the control parameter (temperature) as follows −

T(new) = ⁡0.95T(old)

T(new) = ⁡0.95T(old)

Step 9 − 测试可能如下所示的停止条件 −

Step 9 − Test for the stopping conditions which may be as follows −

  1. Temperature reaches a specified value

  2. There is no change in state for a specified number of iterations

Brain-State-in-a-Box Network

脑状态神经网络 (BSB) 是一种非线性自关联神经网络,可以扩展到具有两层或更多层的异关联。它也类似于霍普菲尔德网络。它是由 J.A. 安德森、J.W. 西尔弗斯坦、S.A. 里茨和 R.S. 琼斯于 1977 年提出的。

The Brain-State-in-a-Box (BSB) neural network is a nonlinear auto-associative neural network and can be extended to hetero-association with two or more layers. It is also similar to Hopfield network. It was proposed by J.A. Anderson, J.W. Silverstein, S.A. Ritz and R.S. Jones in 1977.

关于 BSB 网络需要注意的一些要点 -

Some important points to remember about BSB Network −

  1. It is a fully connected network with the maximum number of nodes depending upon the dimensionality n of the input space.

  2. All the neurons are updated simultaneously.

  3. Neurons take values between -1 to +1.

Mathematical Formulations

BSB 网络中使用的节点函数是一个坡道函数,可定义如下 −

The node function used in BSB network is a ramp function, which can be defined as follows −

f(net)\:=\:min(1,\:max(-1,\:net))

该坡道函数是有界的,且是连续的。

This ramp function is bounded and continuous.

我们知道每个节点都会改变其状态,这可以通过以下数学关系来完成 −

As we know that each node would change its state, it can be done with the help of the following mathematical relation −

x_{t}(t\:+\:1)\:=\:f\left(\begin{array}{c}\displaystyle\sum\limits_{j=1}^n w_{i,j}x_{j}(t)\end{array}\right)

此处, xi(t)ith 节点在时间 t 的状态。

Here, xi(t) is the state of the ith node at time t.

ith 节点到 jth 节点的权重可以通过以下关系测量 −

Weights from ith node to jth node can be measured with the following relation −

w_{ij}\:=\:\frac{1}{P}\displaystyle\sum\limits_{p=1}^P (v_{p,i}\:v_{p,j})

此处, P 是训练模式的数量,它们是双极性的。

Here, P is the number of training patterns, which are bipolar.

Optimization Using Hopfield Network

优化是对设计、情况、资源和系统等进行调整以使其尽可能高效的行动。利用成本函数和能量函数之间的相似性,我们可以使用高度互连的神经元来解决优化问题。这种神经网络是霍普菲尔德网络,它由包含一个或多个完全连接的循环神经元的单层组成。这可用于优化。

Optimization is an action of making something such as design, situation, resource, and system as effective as possible. Using a resemblance between the cost function and energy function, we can use highly interconnected neurons to solve optimization problems. Such a kind of neural network is Hopfield network, that consists of a single layer containing one or more fully connected recurrent neurons. This can be used for optimization.

使用 Hopfield 网络进行优化时需要记住的重点 −

Points to remember while using Hopfield network for optimization −

  1. The energy function must be minimum of the network.

  2. It will find satisfactory solution rather than select one out of the stored patterns.

  3. The quality of the solution found by Hopfield network depends significantly on the initial state of the network.

Travelling Salesman Problem

寻找推销员行进的最短路线是其中一个计算问题,可以通过使用霍普菲尔德神经网络来优化它。

Finding the shortest route travelled by the salesman is one of the computational problems, which can be optimized by using Hopfield neural network.

Basic Concept of TSP

旅行商问题 (TSP) 是一种经典的优化问题,在该问题中,推销员必须游览 n 个城市,这些城市彼此相连,同时保持成本和距离最小。例如,推销员必须游览一组城市 A、B、C、D,目标是找到最短的环形路线 A-B-C–D,以使成本最小化,其中还包括从最后一个城市 D 到第一个城市 A 的出行成本。

Travelling Salesman Problem (TSP) is a classical optimization problem in which a salesman has to travel n cities, which are connected with each other, keeping the cost as well as the distance travelled minimum. For example, the salesman has to travel a set of 4 cities A, B, C, D and the goal is to find the shortest circular tour, A-B-C–D, so as to minimize the cost, which also includes the cost of travelling from the last city D to the first city A.

travelling salesman problem

Matrix Representation

实际上,n 个城市 TSP 的每个行程都可以表示为 n × n ,其中 ith 行描述 ith 个城市的方位。此矩阵 M ,适用于 4 个城市 A、B、C、D,可以表示为以下形式 −

Actually each tour of n-city TSP can be expressed as n × n matrix whose ith row describes the ith city’s location. This matrix, M, for 4 cities A, B, C, D can be expressed as follows −

M = \begin{bmatrix}A: & 1 & 0 & 0 & 0 \\B: & 0 & 1 & 0 & 0 \\C: & 0 & 0 & 1 & 0 \\D: & 0 & 0 & 0 & 1 \end{bmatrix}

Solution by Hopfield Network

在考虑 Hopfield 网络的 TSP 解决方案时,网络中的每个节点对应于矩阵中的一个元素。

While considering the solution of this TSP by Hopfield network, every node in the network corresponds to one element in the matrix.

Energy Function Calculation

为了成为优化的解决方案,能量函数必须为最小值。在以下约束的基础上,我们可以计算能量函数,如下所示 −

To be the optimized solution, the energy function must be minimum. On the basis of the following constraints, we can calculate the energy function as follows −

Constraint-I

第一个约束,在此基础上我们计算能量函数,即矩阵 M 中的每个行必须有一个元素等于 1,并且每一行中的其他元素都必须等于 0 ,因为每个城市在 TSP 行程中只能出现在一个位置。此约束在数学上可以写成以下形式 −

First constraint, on the basis of which we will calculate energy function, is that one element must be equal to 1 in each row of matrix M and other elements in each row must equal to 0 because each city can occur in only one position in the TSP tour. This constraint can mathematically be written as follows −

\displaystyle\sum\limits_{j=1}^n M_{x,j}\:=\:1\:for \: x\:\in \:\lbrace1,…​,n\rbrace

\displaystyle\sum\limits_{j=1}^n M_{x,j}\:=\:1\:for \: x\:\in \:\lbrace1,…​,n\rbrace

现在,基于上述约束,要最小化的能量函数将包含一个与以下项成正比的项 −

Now the energy function to be minimized, based on the above constraint, will contain a term proportional to −

\displaystyle\sum\limits_{x=1}^n \left(\begin{array}{c}1\:-\:\displaystyle\sum\limits_{j=1}^n M_{x,j}\end{array}\right)^2

Constraint-II

众所周知,在 TSP 中,一个城市可以在行程中的任何位置出现,因此,矩阵 M 中的每一列,必须有一个元素等于 1,而其他元素必须等于 0。此约束在数学上可以写成以下形式 −

As we know, in TSP one city can occur in any position in the tour hence in each column of matrix M, one element must equal to 1 and other elements must be equal to 0. This constraint can mathematically be written as follows −

\displaystyle\sum\limits_{x=1}^n M_{x,j}\:=\:1\:for \: j\:\in \:\lbrace1,…​,n\rbrace

\displaystyle\sum\limits_{x=1}^n M_{x,j}\:=\:1\:for \: j\:\in \:\lbrace1,…​,n\rbrace

现在,基于上述约束,要最小化的能量函数将包含一个与以下项成正比的项 −

Now the energy function to be minimized, based on the above constraint, will contain a term proportional to −

\displaystyle\sum\limits_{j=1}^n \left(\begin{array}{c}1\:-\:\displaystyle\sum\limits_{x=1}^n M_{x,j}\end{array}\right)^2

Cost Function Calculation

假设一个 n × n 平方矩阵由 C 表示,表示 n 城市 TSP 的成本矩阵,其中 n > 0 。以下是在计算成本函数时的一些参数 −

Let’s suppose a square matrix of (n × n) denoted by C denotes the cost matrix of TSP for n cities where n > 0. Following are some parameters while calculating the cost function −

  1. Cx, y − The element of cost matrix denotes the cost of travelling from city x to y.

  2. Adjacency of the elements of A and B can be shown by the following relation −

M_{x,i}\:=\:1\:\: 和\:\: M_{y,i\pm 1}\:=\:1

M_{x,i}\:=\:1\:\: and\:\: M_{y,i\pm 1}\:=\:1

众所周知,在矩阵中,每个节点的输出值可以是 0 或 1,因此对于每对城市 A、B,我们可以为能量函数添加以下项−

As we know, in Matrix the output value of each node can be either 0 or 1, hence for every pair of cities A, B we can add the following terms to the energy function −

\displaystyle\sum\limits_{i=1}^n C_{x,y}M_{x,i}(M_{y,i+1}\:+\:M_{y,i-1})

基于上述成本函数和约束值,最终能量函数 E 可以表示如下−

On the basis of the above cost function and constraint value, the final energy function E can be given as follows −

E\:=\:\frac{1}{2}\displaystyle\sum\limits_{i=1}^n\displaystyle\sum\limits_{x}\displaystyle\sum\limits_{y\neq x}C_{x,y}M_{x,i}(M_{y,i+1}\:+\:M_{y,i-1})\:+

\:\begin{bmatrix}\gamma_{1} \displaystyle\sum\limits_{x} \left(\begin{array}{c}1\:-\:\displaystyle\sum\limits_{i} M_{x,i}\end{array}\right)^2\:+\: \gamma_{2} \displaystyle\sum\limits_{i} \left(\begin{array}{c}1\:-\:\displaystyle\sum\limits_{x} M_{x,i}\end{array}\right)^2 \end{bmatrix}

此处, γ1γ2 是两个加权常量。

Here, γ1 and γ2 are two weighing constants.

Other Optimization Techniques

Iterated Gradient Descent Technique

梯度下降,也称为最速下降,是一种迭代优化算法,用于查找函数的局部最小值。在最小化该函数时,我们关心的是要最小化的代价或误差(请记住旅行商问题)。它广泛用于深度学习中,在各种情况下都很有用。这里要记住的一点是我们关心的是局部优化,而不是全局优化。

Gradient descent, also known as the steepest descent, is an iterative optimization algorithm to find a local minimum of a function. While minimizing the function, we are concerned with the cost or error to be minimized (Remember Travelling Salesman Problem). It is extensively used in deep learning, which is useful in a wide variety of situations. The point here to be remembered is that we are concerned with local optimization and not global optimization.

Main Working Idea

我们可以在以下步骤的帮助下理解梯度下降的主要工作思路 −

We can understand the main working idea of gradient descent with the help of the following steps −

  1. First, start with an initial guess of the solution.

  2. Then, take the gradient of the function at that point.

  3. Later, repeat the process by stepping the solution in the negative direction of the gradient.

通过执行上述步骤,该算法最终将在梯度为零时收敛。

By following the above steps, the algorithm will eventually converge where the gradient is zero.

optimization

Mathematical Concept

假设我们有一个函数 f(x) ,我们正在尝试找到此函数的最小值。以下是查找 f(x) 的最小值步骤。

Suppose we have a function f(x) and we are trying to find the minimum of this function. Following are the steps to find the minimum of f(x).

  1. First, give some initial value $x_{0}\:for\:x$

  2. Now take the gradient $\nabla f$ ⁡of function, with the intuition that the gradient will give the slope of the curve at that x and its direction will point to the increase in the function, to find out the best direction to minimize it.

  3. Now change x as follows − x_{n\:+\:1}\:=\:x_{n}\:-\:\theta \nabla f(x_{n})

此处, θ > 0 是训练速率(步长),它迫使算法进行小的跳转。

Here, θ > 0 is the training rate (step size) that forces the algorithm to take small jumps.

Estimating Step Size

实际上,错误的步长 θ 可能达不到收敛,因此仔细选择步长非常重要。在选择步长时必须记住以下几点:

Actually a wrong step size θ may not reach convergence, hence a careful selection of the same is very important. Following points must have to be remembered while choosing the step size

  1. Do not choose too large step size, otherwise it will have a negative impact, i.e. it will diverge rather than converge.

  2. Do not choose too small step size, otherwise it take a lot of time to converge.

关于选择步长的一些选项 -

Some options with regards to choosing the step size −

  1. One option is to choose a fixed step size.

  2. Another option is to choose a different step size for every iteration.

Simulated Annealing

模拟退火 (SA)的基本概念源自固体的退火过程。在退火过程中,如果我们加热金属使其超过熔点,然后冷却它,则结构特性将取决于冷却速率。我们还可以说 SA 模拟了退火的冶金过程。

The basic concept of Simulated Annealing (SA) is motivated by the annealing in solids. In the process of annealing, if we heat a metal above its melting point and cool it down then the structural properties will depend upon the rate of cooling. We can also say that SA simulates the metallurgy process of annealing.

Use in ANN

SA 是一种随机计算方法,受退火类比启发,用于逼近给定函数的全局优化。我们可以使用 SA 来训练前馈神经网络。

SA is a stochastic computational method, inspired by Annealing analogy, for approximating the global optimization of a given function. We can use SA to train feed-forward neural networks.

Algorithm

Step 1 - 生成一个随机解决方案。

Step 1 − Generate a random solution.

Step 2 - 使用一些成本函数计算其成本。

Step 2 − Calculate its cost using some cost function.

Step 3 - 生成一个随机邻域解决方案。

Step 3 − Generate a random neighboring solution.

Step 4 - 通过相同的成本函数计算新的解决方案成本。

Step 4 − Calculate the new solution cost by the same cost function.

Step 5 - 如下比较新解决方案的成本和旧解决方案的成本 -

Step 5 − Compare the cost of a new solution with that of an old solution as follows −

如果 CostNew Solution < CostOld Solution ,则移动到新解决方案。

If CostNew Solution < CostOld Solution then move to the new solution.

Step 6 - 测试停止条件,可能是达到最大迭代次数或获得可接受的解决方案。

Step 6 − Test for the stopping condition, which may be the maximum number of iterations reached or get an acceptable solution.

Artificial Neural Network - Genetic Algorithm

自然一直都是全人类的伟大灵感来源。遗传算法(GA)是基于自然选择和遗传学概念的搜索算法。GA 是一个更大计算分支的子集,该分支称为 Evolutionary Computation

Nature has always been a great source of inspiration to all mankind. Genetic Algorithms (GAs) are search-based algorithms based on the concepts of natural selection and genetics. GAs are a subset of a much larger branch of computation known as Evolutionary Computation.

GA 是约翰·霍兰及其学生和密歇根大学的同事们(最著名的是戴维·E·戈德堡)开发的,并且此后已经在各种优化问题上取得了高度成功。

GAs was developed by John Holland and his students and colleagues at the University of Michigan, most notably David E. Goldberg and has since been tried on various optimization problems with a high degree of success.

在 GA 中,我们有一个给定问题的候选解池或人群。这些解然后经过重组和变异(如同自然遗传学),生成新的孩子,并在这个过程中重复各种世代。每个个体(或候选解)都分配一个适应值(基于其目标函数值),并且适应性强的个体有较高的交配机会,并产生更多“适应性更强”的个体。这与达尔文的“适者生存”理论是一致的。

In GAs, we have a pool or a population of possible solutions to the given problem. These solutions then undergo recombination and mutation (like in natural genetics), producing new children, and the process is repeated over various generations. Each individual (or candidate solution) is assigned a fitness value (based on its objective function value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”.

通过这种方式,我们不断“进化”出更好的个体或解决方案,直至达到停止准则。

In this way, we keep “evolving” better individuals or solutions over generations, till we reach a stopping criterion.

遗传算法在本质上是充分随机化的,但它们比随机局部搜索(我们仅尝试各种随机解决方案,同时跟踪迄今为止最好的解决方案)表现得更好,因为它们也利用历史信息。

Genetic Algorithms are sufficiently randomized in nature, however they perform much better than random local search (in which we just try various random solutions, keeping track of the best so far), as they exploit historical information as well.

Advantages of GAs

GA 具有各种优势,使它们非常受欢迎。这些包括−

GAs have various advantages which have made them immensely popular. These include −

  1. Does not require any derivative information (which may not be available for many real-world problems).

  2. Is faster and more efficient as compared to the traditional methods.

  3. Has very good parallel capabilities.

  4. Optimizes both continuous and discrete functions as well as multi-objective problems.

  5. Provides a list of “good” solutions and not just a single solution.

  6. Always gets an answer to the problem, which gets better over the time.

  7. Useful when the search space is very large and there are large number of parameters involved.

Limitations of GAs

与任何技术一样,遗传算法也有几个局限性。这些包括 −

Like any technique, GAs also suffers from a few limitations. These include −

  1. GAs are not suited for all problems, especially problems which are simple and for which derivative information is available.

  2. Fitness value is calculated repeatedly, which might be computationally expensive for some problems.

  3. Being stochastic, there are no guarantees on the optimality or the quality of the solution.

  4. If not implemented properly, GA may not converge to the optimal solution.

GA – Motivation

遗传算法有能力“足够快”地提供“足够好”的解决方案。这使得遗传算法在解决优化问题中很有吸引力。需要遗传算法的原因如下 −

Genetic Algorithms have the ability to deliver a “good-enough” solution “fast-enough”. This makes Gas attractive for use in solving optimization problems. The reasons why GAs are needed are as follows −

Solving Difficult Problems

在计算机科学中,有很多问题是 NP-Hard 。这基本上意味着,即使是最强大的计算系统也要花很长时间(甚至数年!)才能解决该问题。在这种情况下,遗传算法被证明是一种有效工具,可以在短时间内提供 usable near-optimal solutions

In computer science, there is a large set of problems, which are NP-Hard. What this essentially means is that, even the most powerful computing systems take a very long time (even years!) to solve that problem. In such a scenario, GAs prove to be an efficient tool to provide usable near-optimal solutions in a short amount of time.

Failure of Gradient Based Methods

基于传统微积分的方法通过从一个随机点开始并朝梯度方向移动来工作,直到我们到达山顶。这种技术有效,并且非常适合单峰目标函数,例如线性回归中的成本函数。然而,在大多数实际情况下,我们有一个非常复杂的问题,称为景观,由许多山峰和许多山谷组成,这导致此类方法失败,因为它们倾向于停滞在局部最优值,如下图所示。

Traditional calculus based methods work by starting at a random point and by moving in the direction of the gradient, till we reach the top of the hill. This technique is efficient and works very well for single-peaked objective functions like the cost function in linear regression. However, in most real-world situations, we have a very complex problem called as landscapes, made of many peaks and many valleys, which causes such methods to fail, as they suffer from an inherent tendency of getting stuck at the local optima as shown in the following figure.

failure ga

Getting a Good Solution Fast

旅行商问题 (TSP) 等一些困难问题具有实际应用,例如寻路和超大规模集成 (VLSI) 设计。现在想象一下您正在使用 GPS 导航系统,它需要几分钟(甚至几小时)来计算从源到目的地的“最佳”路径。在这样的实际应用中,延迟是不可接受的,因此需要一个“足够好”的解决方案,即“快速”交付的解决方案。

Some difficult problems like the Travelling Salesman Problem (TSP), have real-world applications like path finding and VLSI Design. Now imagine that you are using your GPS Navigation system, and it takes a few minutes (or even a few hours) to compute the “optimal” path from the source to destination. Delay in such real-world applications is not acceptable and therefore a “good-enough” solution, which is delivered “fast” is what is required.

How to Use GA for Optimization Problems?

我们已经知道,优化是为了使设计、情况、资源和系统等尽可能有效。优化过程在以下图表中显示。

We already know that optimization is an action of making something such as design, situation, resource, and system as effective as possible. Optimization process is shown in the following diagram.

how to use

Stages of GA Mechanism for Optimization Process

以下是用于优化问题时 GA 机制的阶段。

Followings are the stages of GA mechanism when used for optimization of problems.

  1. Generate the initial population randomly.

  2. Select the initial solution with the best fitness values.

  3. Recombine the selected solutions using mutation and crossover operators.

  4. Insert offspring into the population.

  5. Now if the stop condition is met, then return the solution with their best fitness value. Else, go to step 2.

Applications of Neural Networks

在研究神经网络已广泛使用的领域之前,我们需要了解为什么神经网络将成为首选应用程序。

Before studying the fields where ANN has been used extensively, we need to understand why ANN would be the preferred choice of application.

Why Artificial Neural Networks?

我们需要通过人的例子来理解上述问题的答案。作为一个孩子,我们曾经在长辈的帮助下学习事物,其中包括父母或老师。然后,通过自学或实践,我们在整个生命中不断学习。科学家和研究人员也像人类一样,让机器变得智能,而神经网络由于以下原因,在其中扮演着非常重要的角色:

We need to understand the answer to the above question with an example of a human being. As a child, we used to learn the things with the help of our elders, which includes our parents or teachers. Then later by self-learning or practice we keep learning throughout our life. Scientists and researchers are also making the machine intelligent, just like a human being, and ANN plays a very important role in the same due to the following reasons −

  1. With the help of neural networks, we can find the solution of such problems for which algorithmic method is expensive or does not exist.

  2. Neural networks can learn by example, hence we do not need to program it at much extent.

  3. Neural networks have the accuracy and significantly fast speed than conventional speed.

Areas of Application

以下是神经网络使用的一些领域。它表明神经网络在其开发和应用中采用跨学科的方法。

Followings are some of the areas, where ANN is being used. It suggests that ANN has an interdisciplinary approach in its development and applications.

Speech Recognition

语言在人际交往中占有突出地位。因此,人们自然会期望与计算机进行语音交互。在当前时代,为了与机器进行通信,人类仍然需要复杂的语言,这些语言难以学习和使用。为了消除这种交流障碍,一种简单的解决方案可能是以机器能够理解的口语进行交流。

Speech occupies a prominent role in human-human interaction. Therefore, it is natural for people to expect speech interfaces with computers. In the present era, for communication with machines, humans still need sophisticated languages which are difficult to learn and use. To ease this communication barrier, a simple solution could be, communication in a spoken language that is possible for the machine to understand.

在这一领域已经取得了很大进展,然而,此类系统仍然面临词汇或语法有限的问题,以及针对不同条件的不同说话人对系统进行再培训的问题。神经网络在这个领域发挥着重要作用。下列神经网络用于语音识别:

Great progress has been made in this field, however, still such kinds of systems are facing the problem of limited vocabulary or grammar along with the issue of retraining of the system for different speakers in different conditions. ANN is playing a major role in this area. Following ANNs have been used for speech recognition −

  1. Multilayer networks

  2. Multilayer networks with recurrent connections

  3. Kohonen self-organizing feature map

最适合这种网络的是 Kohonen 自组织特征映射,它的输入是语音波形的短片段。它会将同类音素映射到输出阵列中,称为特征提取技术。在提取特征之后,借助一些作为后端处理的声学模型,它将识别出说话内容。

The most useful network for this is Kohonen Self-Organizing feature map, which has its input as short segments of the speech waveform. It will map the same kind of phonemes as the output array, called feature extraction technique. After extracting the features, with the help of some acoustic models as back-end processing, it will recognize the utterance.

Character Recognition

这是一个属于模式识别一般领域的有趣问题。许多神经网络已被开发用于自动识别手写字符,无论是字母还是数字。以下是用于字符识别的某些 ANN −

It is an interesting problem which falls under the general area of Pattern Recognition. Many neural networks have been developed for automatic recognition of handwritten characters, either letters or digits. Following are some ANNs which have been used for character recognition −

  1. Multilayer neural networks such as Backpropagation neural networks.

  2. Neocognitron

虽然反向传播神经网络有几个隐藏层,但从一层到下一层的连接模式是局部化的。类似地,新认知网络也拥有几个隐藏层,并且它针对此类应用分层进行训练。

Though back-propagation neural networks have several hidden layers, the pattern of connection from one layer to the next is localized. Similarly, neocognitron also has several hidden layers and its training is done layer by layer for such kind of applications.

Signature Verification Application

签名是在法律交易中授权和验证某个人的最有用的方法之一。签名验证技术是一种非视觉技术。

Signatures are one of the most useful ways to authorize and authenticate a person in legal transactions. Signature verification technique is a non-vision based technique.

对于此应用,第一种方法是提取特征或代表签名的一组几何特征。利用这些特征集,我们必须使用有效的神经网络算法训练神经网络。此经过训练的神经网络在验证阶段将签名分类为真品或伪造。

For this application, the first approach is to extract the feature or rather the geometrical feature set representing the signature. With these feature sets, we have to train the neural networks using an efficient neural network algorithm. This trained neural network will classify the signature as being genuine or forged under the verification stage.

Human Face Recognition

这是识别给定人脸的生物识别方法之一。这是典型的任务,因为其表征为“非人脸”图像。但如果神经网络接受了良好的训练,那么它可以根据图像将图像分为两类,即有脸的图像和没有脸的图像。

It is one of the biometric methods to identify the given face. It is a typical task because of the characterization of “non-face” images. However, if a neural network is well trained, then it can be divided into two classes namely images having faces and images that do not have faces.

首先,必须对所有输入图像进行预处理。然后,必须减小该图像的维数。最后,必须使用神经网络训练算法对其进行分类。以下神经网络用于使用经过预处理的图像进行训练目的 −

First, all the input images must be preprocessed. Then, the dimensionality of that image must be reduced. And, at last it must be classified using neural network training algorithm. Following neural networks are used for training purposes with preprocessed image −

  1. Fully-connected multilayer feed-forward neural network trained with the help of back-propagation algorithm.

  2. For dimensionality reduction, Principal Component Analysis (PCA) is used.