Artificial Neural Network 简明教程
Unsupervised Learning
顾名思义,这种类型的学习是在没有老师指导的情况下进行的。这种学习过程是独立的。在非监督学习下训练人工神经网络期间,类似类型的输入向量结合形成簇。当应用一个新的输入模式时,神经网络给出输出响应来指示输入模式所属的类别。在此过程中,不会有来自环境的反馈,说明期望的输出应是什么,以及它是否正确或不正确。因此,在这种类型的学习中,网络本身必须从输入数据中发现模式、特性,以及输入数据与输出之间的关系。
As the name suggests, this type of learning is done without the supervision of a teacher. This learning process is independent. During the training of ANN under unsupervised learning, the input vectors of similar type are combined to form clusters. When a new input pattern is applied, then the neural network gives an output response indicating the class to which input pattern belongs. In this, there would be no feedback from the environment as to what should be the desired output and whether it is correct or incorrect. Hence, in this type of learning the network itself must discover the patterns, features from the input data and the relation for the input data over the output.
Winner-Takes-All Networks
这些类型的网络基于竞争学习规则,并且会使用以下策略:它选择具有最大总输入量的神经元作为获胜者。输出神经元之间的连接展示了它们之间的竞争,其中一个会“开启”,这意味着它会是获胜者,其他会“关闭”。
These kinds of networks are based on the competitive learning rule and will use the strategy where it chooses the neuron with the greatest total inputs as a winner. The connections between the output neurons show the competition between them and one of them would be ‘ON’ which means it would be the winner and others would be ‘OFF’.
以下是使用非监督学习基于此简单概念的一些网络。
Following are some of the networks based on this simple concept using unsupervised learning.
Hamming Network
在使用非监督学习的大多数神经网络中,计算距离并执行比较至关重要。这种类型的网络是汉明网络,对于给定的每个输入向量,它会将其聚类到不同的组中。以下是汉明网络的一些重要特性:
In most of the neural networks using unsupervised learning, it is essential to compute the distance and perform comparisons. This kind of network is Hamming network, where for every given input vectors, it would be clustered into different groups. Following are some important features of Hamming Networks −
-
Lippmann started working on Hamming networks in 1987.
-
It is a single layer network.
-
The inputs can be either binary {0, 1} of bipolar {-1, 1}.
-
The weights of the net are calculated by the exemplar vectors.
-
It is a fixed weight network which means the weights would remain the same even during training.
Max Net
这也是一个固定权重网络,它充当一个子网络,用于选择具有最高输入的节点。所有节点都完全互连,并且在所有这些加权互连中都存在对称权重。
This is also a fixed weight network, which serves as a subnet for selecting the node having the highest input. All the nodes are fully interconnected and there exists symmetrical weights in all these weighted interconnections.
Architecture
它使用机制,这是一个迭代过程,每个节点通过连接接收来自所有其他节点的抑制性输入。值最大的单个节点将处于活动状态或获胜状态,所有其他节点的激活将处于非活动状态。Max Net 使用恒等激活函数,其为 f(x)\:=\:\begin{cases}x & if\:x > 0\\0 & if\:x \leq 0\end{cases}
It uses the mechanism which is an iterative process and each node receives inhibitory inputs from all other nodes through connections. The single node whose value is maximum would be active or winner and the activations of all other nodes would be inactive. Max Net uses identity activation function with f(x)\:=\:\begin{cases}x & if\:x > 0\\0 & if\:x \leq 0\end{cases}
该网络的任务由 +1 自激发权重和相互抑制幅度完成,其设置为 [0 < ɛ < $\frac{1}{m}$],其中 “m” 是节点总数。
The task of this net is accomplished by the self-excitation weight of +1 and mutual inhibition magnitude, which is set like [0 < ɛ < $\frac{1}{m}$] where “m” is the total number of the nodes.
Competitive Learning in ANN
它与非监督训练有关,其中输出节点尝试相互竞争以表示输入模式。为了理解这个学习规则,我们必须了解竞争网络,其解释如下:
It is concerned with unsupervised training in which the output nodes try to compete with each other to represent the input pattern. To understand this learning rule we will have to understand competitive net which is explained as follows −
Basic Concept of Competitive Network
该网络就像一个单层前馈网络,在输出之间具有反馈连接。输出之间的连接是抑制型的,由虚线表示,这意味着竞争者永远不会支持自己。
This network is just like a single layer feed-forward network having feedback connection between the outputs. The connections between the outputs are inhibitory type, which is shown by dotted lines, which means the competitors never support themselves.
Basic Concept of Competitive Learning Rule
如前所述,输出节点之间存在竞争,因此主要概念是:在训练期间,对给定输入模式具有最高激活度的输出单元将被宣布为获胜者。此规则也称为赢家通吃,因为只有获胜的神经元会被更新,而其余神经元保持不变。
As said earlier, there would be competition among the output nodes so the main concept is - during training, the output unit that has the highest activation to a given input pattern, will be declared the winner. This rule is also called Winner-takes-all because only the winning neuron is updated and the rest of the neurons are left unchanged.
Mathematical Formulation
以下是此学习规则的数学公式的三个重要因素:
Following are the three important factors for mathematical formulation of this learning rule −
-
Condition to be a winner Suppose if a neuron yk wants to be the winner, then there would be the following condition y_{k}\:=\:\begin{cases}1 & if\:v_{k} > v_{j}\:for\:all\:\:j,\:j\:\neq\:k\\0 & otherwise\end{cases} It means that if any neuron, say, yk wants to win, then its induced local field (the output of the summation unit), say vk, must be the largest among all the other neurons in the network.
-
Condition of the sum total of weight Another constraint over the competitive learning rule is the sum total of weights to a particular output neuron is going to be 1. For example, if we consider neuron k then \displaystyle\sum\limits_{k} w_{kj}\:=\:1\:\:\:\:for\:all\:\:k
-
Change of weight for the winner If a neuron does not respond to the input pattern, then no learning takes place in that neuron. However, if a particular neuron wins, then the corresponding weights are adjusted as follows − \Delta w_{kj}\:=\:\begin{cases}-\alpha(x_{j}\:-\:w_{kj}), & if\:neuron\:k\:wins\\0 & if\:neuron\:k\:losses\end{cases} Here $\alpha$ is the learning rate. This clearly shows that we are favoring the winning neuron by adjusting its weight and if a neuron is lost, then we need not bother to re-adjust its weight.
K-means Clustering Algorithm
K-means 是最流行的群集算法之一,其中我们使用分区程序的概念。我们从初始分区开始,并反复将模式从一个群集移动到另一个群集,直到获得满意结果。
K-means is one of the most popular clustering algorithm in which we use the concept of partition procedure. We start with an initial partition and repeatedly move patterns from one cluster to another, until we get a satisfactory result.
Algorithm
Step 1 − 选择 k 个点作为初始质心。初始化 k 个原型 (w1,…,wk) ,例如,我们可以用随机选择的输入向量识别它们 −
Step 1 − Select k points as the initial centroids. Initialize k prototypes (w1,…,wk), for example we can identifying them with randomly chosen input vectors −
W_{j}\:=\:i_{p},\:\:\: where\:j\:\in \lbrace1,….,k\rbrace\:and\:p\:\in \lbrace1,….,n\rbrace
W_{j}\:=\:i_{p},\:\:\: where\:j\:\in \lbrace1,….,k\rbrace\:and\:p\:\in \lbrace1,….,n\rbrace
每个群集 Cj 都与原型 wj 相关。
Each cluster Cj is associated with prototype wj.
Step 2 − 重复步骤 3-5,直到 E 不再减小,或群集成员资格不再改变。
Step 2 − Repeat step 3-5 until E no longer decreases, or the cluster membership no longer changes.
Step 3 − 对于每个输入向量 ip ,其中 p ∈ {1,…,n} ,将 ip 放在与最近的原型 wj 具有以下关系的群集 Cj 中:
Step 3 − For each input vector ip where p ∈ {1,…,n}, put ip in the cluster Cj with the nearest prototype wj having the following relation
|i_{p}\:-\:w_{j*}|\:\leq\:|i_{p}\:-\:w_{j}|,\:j\:\in \lbrace1,….,k\rbrace
|i_{p}\:-\:w_{j*}|\:\leq\:|i_{p}\:-\:w_{j}|,\:j\:\in \lbrace1,….,k\rbrace
Step 4 − 对于每个群集 Cj ,其中 j ∈ { 1,…,k} ,更新原型 wj 为 Cj 中当前所有样本的质心,以便:
Step 4 − For each cluster Cj, where j ∈ { 1,…,k}, update the prototype wj to be the centroid of all samples currently in Cj , so that
w_{j}\:=\:\sum_{i_{p}\in C_{j}}\frac{i_{p}}{|C_{j}|}
Step 5 − 计算总量化误差如下 −
Step 5 − Compute the total quantization error as follows −
E\:=\:\sum_{j=1}^k\sum_{i_{p}\in w_{j}}|i_{p}\:-\:w_{j}|^2
Neocognitron
它是一个多层前馈网络,由福岛于 20 世纪 80 年代开发。该模型基于监督学习,用于视觉模式识别,主要是手写字符。它基本上是福岛于 1975 年开发的 Cognitron 网络的扩展。
It is a multilayer feedforward network, which was developed by Fukushima in 1980s. This model is based on supervised learning and is used for visual pattern recognition, mainly hand-written characters. It is basically an extension of Cognitron network, which was also developed by Fukushima in 1975.
Architecture
它是一个层次网络,包含许多层,并且在这些层中局部有连接模式。
It is a hierarchical network, which comprises many layers and there is a pattern of connectivity locally in those layers.
正如我们在上图中看到的,新认知网络被划分为不同的连接层,并且每层有两个细胞。以下是对这些细胞的解释 −
As we have seen in the above diagram, neocognitron is divided into different connected layers and each layer has two cells. Explanation of these cells is as follows −
S-Cell − 它被称为一个简单细胞,经过训练,可以对特定的模式或一组模式做出反应。
S-Cell − It is called a simple cell, which is trained to respond to a particular pattern or a group of patterns.
C-Cell − 它被称为复杂细胞,它将来自 S 细胞的输出结合起来,同时减少了每个阵列中的单元数量。从另一个意义上讲,C 细胞取代了 S 细胞的结果。
C-Cell − It is called a complex cell, which combines the output from S-cell and simultaneously lessens the number of units in each array. In another sense, C-cell displaces the result of S-cell.
Training Algorithm
发现新认知网络的训练逐层进行。训练从输入层到第一层的权重并冻结它们。然后,训练从第一层到第二层的权重,以此类推。S 细胞和 C 细胞之间的内部计算取决于来自前一层的权重。因此,我们可以说训练算法取决于 S 细胞和 C 细胞的计算。
Training of neocognitron is found to be progressed layer by layer. The weights from the input layer to the first layer are trained and frozen. Then, the weights from the first layer to the second layer are trained, and so on. The internal calculations between S-cell and Ccell depend upon the weights coming from the previous layers. Hence, we can say that the training algorithm depends upon the calculations on S-cell and C-cell.
Calculations in S-cell
S 细胞拥有从前一层接收到的兴奋信号并拥有在同一层内获得的抑制信号。
The S-cell possesses the excitatory signal received from the previous layer and possesses inhibitory signals obtained within the same layer.
\theta=\:\sqrt{\sum\sum t_{i} c_{i}^2}
这里, ti 是固定权重, ci 是 C 单元格的输出。
Here, ti is the fixed weight and ci is the output from C-cell.
S 单元格的缩放输入可用下述方式计算:
The scaled input of S-cell can be calculated as follows −
x\:=\:\frac{1\:+\:e}{1\:+\:vw_{0}}\:-\:1
这里,$e\:=\:\sum_i c_{i}w_{i}$
Here, $e\:=\:\sum_i c_{i}w_{i}$
wi 是从 C 单元格到 S 单元格调整的权重。
wi is the weight adjusted from C-cell to S-cell.
w0 是输入与 S 单元格之间可调整的权重。
w0 is the weight adjustable between the input and S-cell.
v 是从 C 单元格的激励性输入。
v is the excitatory input from C-cell.
输出信号的激活为:
The activation of the output signal is,
s\:=\:\begin{cases}x, & if\:x \geq 0\\0, & if\:x < 0\end{cases}
Calculations in C-cell
C 层的净输入是:
The net input of C-layer is
C\:=\:\displaystyle\sum\limits_i s_{i}x_{i}
这里, si 是 S 单元格的输出, xi 是从 S 单元格到 C 单元格的固定权重。
Here, si is the output from S-cell and xi is the fixed weight from S-cell to C-cell.
最终输出如下:
The final output is as follows −
C_{out}\:=\:\begin{cases}\frac{C}{a+C}, & if\:C > 0\\0, & otherwise\end{cases}
这里 ‘a’ 是取决于网络性能的参数。
Here ‘a’ is the parameter that depends on the performance of the network.