Artificial Neural Network 简明教程

Supervised Learning

顾名思义,{s0} 在教师的监督下进行。此学习过程是相依的。在受监督学习中训练 ANN 期间,将输入向量提供给网络,该网络会产生一个输出向量。将该输出向量与所需/目标输出向量进行比较。如果实际输出和所需的/目标输出向量之间存在差异,则会生成一个误差信号。在该误差信号的基础上,将调整权重,直到实际输出与所需的输出匹配。

As the name suggests, supervised learning takes place under the supervision of a teacher. This learning process is dependent. During the training of ANN under supervised learning, the input vector is presented to the network, which will produce an output vector. This output vector is compared with the desired/target output vector. An error signal is generated if there is a difference between the actual output and the desired/target output vector. On the basis of this error signal, the weights would be adjusted until the actual output is matched with the desired output.

Perceptron

由 Frank Rosenblatt 使用 McCulloch 和 Pitts 模型开发的感知器是人工神经网络的基本操作单元。它使用受监督的学习规则,并且能够将数据分类为两个类。

Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic operational unit of artificial neural networks. It employs supervised learning rule and is able to classify the data into two classes.

感知器的操作特征:它包含一个神经元,该神经元具有任意数量的输入和可调整权重,但神经元的输出根据阈值确定为 1 或 0。它还包含一个偏差,其权重始终为 1。下图给出了感知器的示意图表示。

Operational characteristics of the perceptron: It consists of a single neuron with an arbitrary number of inputs along with adjustable weights, but the output of the neuron is 1 or 0 depending upon the threshold. It also consists of a bias whose weight is always 1. Following figure gives a schematic representation of the perceptron.

perceptron

因此,感知器具有以下三个基本元素 −

Perceptron thus has the following three basic elements −

  1. Links − It would have a set of connection links, which carries a weight including a bias always having weight 1.

  2. Adder − It adds the input after they are multiplied with their respective weights.

  3. Activation function − It limits the output of neuron. The most basic activation function is a Heaviside step function that has two possible outputs. This function returns 1, if the input is positive, and 0 for any negative input.

Training Algorithm

感知器网络可以针对单输出单元以及多个输出单元进行训练。

Perceptron network can be trained for single output unit as well as multiple output units.

Training Algorithm for Single Output Unit

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Bias

  3. Learning rate $\alpha$

为了便于计算和简化,权重和偏差必须设置为 0,而学习率必须设置为 1。

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1.

Step 2 − 当停止条件不成立时,继续执行步骤 3-8。

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − 对每个训练向量 x 继续执行步骤 4-6。

Step 3 − Continue step 4-6 for every training vector x.

Step 4 − 如下激活每个输入单元 −

Step 4 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 5 − 现在,使用以下关系获取净输入 −

Step 5 − Now obtain the net input with the following relation −

y_{in}\:=\:b\:+\:\displaystyle\sum\limits_{i}^n x_{i}.\:w_{i}

其中 ‘b’ 为偏差, ‘n’ 为输入神经元的总数。

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − 应用以下激活函数获取最终输出。

Step 6 − Apply the following activation function to obtain the final output.

f(y_{in})\:=\:\begin{cases}1 & if\:y_{in}\:>\:\theta\\0 & if \: -\theta\:\leqslant\:y_{in}\:\leqslant\:\theta\\-1 & if\:y_{in}\:<\:-\theta \end{cases}

Step 7 − 如下调整权重和偏差 −

Step 7 − Adjust the weight and bias as follows −

Case 1 − 如果 y ≠ t ,那么

Case 1 − if y ≠ t then,

w_{i}(new)\:=\:w_{i}(old)\:+\:\alpha\:tx_{i}

b(new)\:=\:b(old)\:+\:\alpha t

Case 2 − 如果 y = t ,那么

Case 2 − if y = t then,

w_{i}(new)\:=\:w_{i}(old)

b(new)\:=\:b(old)

其中 ‘y’ 为实际输出, ‘t’ 为期望/目标输出。

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Step 8 − 测试停止条件,即当权重无变化时发生。

Step 8 − Test for the stopping condition, which would happen when there is no change in weight.

Training Algorithm for Multiple Output Units

以下图表为多输出类别的感知器的架构。

The following diagram is the architecture of perceptron for multiple output classes.

training algorithm

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Bias

  3. Learning rate $\alpha$

为了便于计算和简化,权重和偏差必须设置为 0,而学习率必须设置为 1。

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1.

Step 2 − 当停止条件不成立时,继续执行步骤 3-8。

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − 对每个训练向量 x 继续执行步骤 4-6。

Step 3 − Continue step 4-6 for every training vector x.

Step 4 − 如下激活每个输入单元 −

Step 4 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 5 - 使用以下关系来获得净输入 -

Step 5 − Obtain the net input with the following relation −

y_{in}\:=\:b\:+\:\displaystyle\sum\limits_{i}^n x_{i}\:w_{ij}

其中 ‘b’ 为偏差, ‘n’ 为输入神经元的总数。

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 - 应用以下激活函数来获取每个输出单元的最终输出 j = 1 to m -

Step 6 − Apply the following activation function to obtain the final output for each output unit j = 1 to m

f(y_{in})\:=\:\begin{cases}1 & if\:y_{inj}\:>\:\theta\\0 & if \: -\theta\:\leqslant\:y_{inj}\:\leqslant\:\theta\\-1 & if\:y_{inj}\:<\:-\theta \end{cases}

Step 7 - 如下方式为 x = 1 to nj = 1 to m 调整权值和偏差 -

Step 7 − Adjust the weight and bias for x = 1 to n and j = 1 to m as follows −

Case 1 - 如果 yj ≠ tj 则,

Case 1 − if yj ≠ tj then,

w_{ij}(new)\:=\:w_{ij}(old)\:+\:\alpha\:t_{j}x_{i}

b_{j}(new)\:=\:b_{j}(old)\:+\:\alpha t_{j}

Case 2 - 如果 yj = tj 则,

Case 2 − if yj = tj then,

w_{ij}(new)\:=\:w_{ij}(old)

b_{j}(new)\:=\:b_{j}(old)

其中 ‘y’ 为实际输出, ‘t’ 为期望/目标输出。

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Step 8 - 当权值没有改变时,测试终止条件。

Step 8 − Test for the stopping condition, which will happen when there is no change in weight.

Adaptive Linear Neuron (Adaline)

Adaline 的全称为自适应线性神经元,是一个具有单个线性单元的神经网络。它是由 Widrow 和 Hoff 在 1960 年开发的。关于 Adaline 的一些重要事项如下 -

Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit. It was developed by Widrow and Hoff in 1960. Some important points about Adaline are as follows −

  1. It uses bipolar activation function.

  2. It uses delta rule for training to minimize the Mean-Squared Error (MSE) between the actual output and the desired/target output.

  3. The weights and the bias are adjustable.

Architecture

Adaline 的基本结构类似于感知器,具有一个额外的反馈回路,可借助它将实际输出与期望/目标输出进行比较。在基于训练算法进行比较后,权值和偏差将被更新。

The basic structure of Adaline is similar to perceptron having an extra feedback loop with the help of which the actual output is compared with the desired/target output. After comparison on the basis of training algorithm, the weights and bias will be updated.

architecture adaptive linear

Training Algorithm

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Bias

  3. Learning rate $\alpha$

为了便于计算和简化,权重和偏差必须设置为 0,而学习率必须设置为 1。

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1.

Step 2 − 当停止条件不成立时,继续执行步骤 3-8。

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 - 对每个双极训练对继续进行步骤 4-6 s:t

Step 3 − Continue step 4-6 for every bipolar training pair s:t.

Step 4 − 如下激活每个输入单元 −

Step 4 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 5 - 使用以下关系来获得净输入 -

Step 5 − Obtain the net input with the following relation −

y_{in}\:=\:b\:+\:\displaystyle\sum\limits_{i}^n x_{i}\:w_{i}

其中 ‘b’ 为偏差, ‘n’ 为输入神经元的总数。

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − 使用以下激活函数获得最终输出 −

Step 6 − Apply the following activation function to obtain the final output −

f(y_{in})\:=\:\begin{cases}1 & 若 y_{in}\:\geqslant\:0 \\-1 & 若 y_{in}\:<\:0 \end{cases}

f(y_{in})\:=\:\begin{cases}1 & if\:y_{in}\:\geqslant\:0 \\-1 & if\:y_{in}\:<\:0 \end{cases}

Step 7 − 如下调整权重和偏差 −

Step 7 − Adjust the weight and bias as follows −

Case 1 − 如果 y ≠ t ,那么

Case 1 − if y ≠ t then,

w_{i}(new)\:=\:w_{i}(old)\:+\: \alpha(t\:-\:y_{in})x_{i}

b(new)\:=\:b(old)\:+\: \alpha(t\:-\:y_{in})

Case 2 − 如果 y = t ,那么

Case 2 − if y = t then,

w_{i}(new)\:=\:w_{i}(old)

b(new)\:=\:b(old)

其中 ‘y’ 为实际输出, ‘t’ 为期望/目标输出。

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

$(t\:-\;y_{in})$ 为计算所得的误差。

$(t\:-\;y_{in})$ is the computed error.

Step 8 − 测试停止条件,当权重没有变化或训练过程中的最高权重变化小于指定容差时会发生。

Step 8 − Test for the stopping condition, which will happen when there is no change in weight or the highest weight change occurred during training is smaller than the specified tolerance.

Multiple Adaptive Linear Neuron (Madaline)

Madaline 即多级自适应线性神经元,是一个由多个以并行方式连接的 Adaline 神经元组成的网络。它将只有一个输出单元。以下为关于 Madaline 的一些重要要点:

Madaline which stands for Multiple Adaptive Linear Neuron, is a network which consists of many Adalines in parallel. It will have a single output unit. Some important points about Madaline are as follows −

  1. It is just like a multilayer perceptron, where Adaline will act as a hidden unit between the input and the Madaline layer.

  2. The weights and the bias between the input and Adaline layers, as in we see in the Adaline architecture, are adjustable.

  3. The Adaline and Madaline layers have fixed weights and bias of 1.

  4. Training can be done with the help of Delta rule.

Architecture

Madaline 的架构包含 “n” 个输入层神经元、 “m” 个 Adaline 层神经元和 1 个 Madaline 层神经元。由于 Adaline 层位于输入层和输出层(即 Madaline 层)之间,因此可以将其视为一个隐藏层。

The architecture of Madaline consists of “n” neurons of the input layer, “m” neurons of the Adaline layer, and 1 neuron of the Madaline layer. The Adaline layer can be considered as the hidden layer as it is between the input layer and the output layer, i.e. the Madaline layer.

adaline

Training Algorithm

目前我们已经知道,只有输入层与 Adaline 层之间的权重和偏差需要调节,而 Adaline 层与 Madaline 层之间的权重和偏差是固定的。

By now we know that only the weights and bias between the input and the Adaline layer are to be adjusted, and the weights and bias between the Adaline and the Madaline layer are fixed.

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Bias

  3. Learning rate $\alpha$

为了便于计算和简化,权重和偏差必须设置为 0,而学习率必须设置为 1。

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1.

Step 2 − 当停止条件不成立时,继续执行步骤 3-8。

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − 针对每对双极训练数据执行步骤 4-7 s:t

Step 3 − Continue step 4-7 for every bipolar training pair s:t.

Step 4 − 如下激活每个输入单元 −

Step 4 − Activate each input unit as follows −

x_{i}\:=\:s_{i}\:(i\:=\:1\:to\:n)

Step 5 − 使用以下关系获取每个隐藏层(即 Adaline 层)的净输入:

Step 5 − Obtain the net input at each hidden layer, i.e. the Adaline layer with the following relation −

Q_{inj}\:=\:b_{j}\:+\:\displaystyle\sum\limits_{i}^n x_{i}\:w_{ij}\:\:\:j\:=\:1\:to\:m

其中 ‘b’ 为偏差, ‘n’ 为输入神经元的总数。

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − 使用以下激活函数获取 Adaline 层和 Madaline 层的最终输出:

Step 6 − Apply the following activation function to obtain the final output at the Adaline and the Madaline layer −

f(x)\:=\:\begin{cases}1 & 若 x\:\geqslant\:0 \\-1 & 若 x\:<\:0 \end{cases}

f(x)\:=\:\begin{cases}1 & if\:x\:\geqslant\:0 \\-1 & if\:x\:<\:0 \end{cases}

隐层(Adaline)输出单元的输出

Output at the hidden (Adaline) unit

Q_{j}\:=\:f(Q_{inj})

网络最终输出

Final output of the network

y\:=\:f(y_{in})

i.e. $\:\:y_{inj}\:=\:b_{0}\:+\:\sum_{j = 1}^m\:Q_{j}\:v_{j}$

i.e. $\:\:y_{inj}\:=\:b_{0}\:+\:\sum_{j = 1}^m\:Q_{j}\:v_{j}$

Step 7 − 按如下方式计算误差并调整权重 −

Step 7 − Calculate the error and adjust the weights as follows −

Case 1 − 如果 y ≠ tt = 1 则,

Case 1 − if y ≠ t and t = 1 then,

w_{ij}(new)\:=\:w_{ij}(old)\:+\: \alpha(1\:-\:Q_{inj})x_{i}

b_{j}(new)\:=\:b_{j}(old)\:+\: \alpha(1\:-\:Q_{inj})

在这种情况下,权重将会在 Qj 处更新,因为 t = 1 ,净输入接近 0。

In this case, the weights would be updated on Qj where the net input is close to 0 because t = 1.

Case 2 − 如果 y ≠ tt = -1 则,

Case 2 − if y ≠ t and t = -1 then,

w_{ik}(new)\:=\:w_{ik}(old)\:+\: \alpha(-1\:-\:Q_{ink})x_{i}

b_{k}(new)\:=\:b_{k}(old)\:+\: \alpha(-1\:-\:Q_{ink})

在这种情况下,权重将会在 Qk 处更新,因为 t = -1 ,净输入为正。

In this case, the weights would be updated on Qk where the net input is positive because t = -1.

其中 ‘y’ 为实际输出, ‘t’ 为期望/目标输出。

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Case 3 − 如果 y = t 则,

Case 3 − if y = t then

权重不会发生任何变化。

There would be no change in weights.

Step 8 − 测试停止条件,当权重没有变化或训练过程中的最高权重变化小于指定容差时会发生。

Step 8 − Test for the stopping condition, which will happen when there is no change in weight or the highest weight change occurred during training is smaller than the specified tolerance.

Back Propagation Neural Networks

反向传播神经(BPN)是一个多层次神经网络,包含输入层、至少一个隐层和输出层。如同它的名称所暗示的,这种网络会发生反向传播。在输出层计算的误差,通过比较目标输出和实际输出,将会反向传播到输入层。

Back Propagation Neural (BPN) is a multilayer neural network consisting of the input layer, at least one hidden layer and output layer. As its name suggests, back propagating will take place in this network. The error which is calculated at the output layer, by comparing the target output and the actual output, will be propagated back towards the input layer.

Architecture

如该图表所示,BPN 的架构具有三个相互连接的层,在这些层上具有权重。隐层和输出层也具有偏置,其权重始终为 1。显然,BPN 的工作分为两个阶段。一个阶段将信号从输入层传送到输出层,另一个阶段将误差从输出层反向传播到输入层。

As shown in the diagram, the architecture of BPN has three interconnected layers having weights on them. The hidden layer as well as the output layer also has bias, whose weight is always 1, on them. As is clear from the diagram, the working of BPN is in two phases. One phase sends the signal from the input layer to the output layer, and the other phase back propagates the error from the output layer to the input layer.

back propogation

Training Algorithm

对于训练,BPN 将使用二进制 S 形激活函数。BPN 的训练将具有如下三个阶段。

For training, BPN will use binary sigmoid activation function. The training of BPN will have the following three phases.

  1. Phase 1 − Feed Forward Phase

  2. Phase 2 − Back Propagation of error

  3. Phase 3 − Updating of weights

所有这些步骤将在算法中按如下所示总结

All these steps will be concluded in the algorithm as follows

Step 1 − 初始化以下内容以启动训练 −

Step 1 − Initialize the following to start the training −

  1. Weights

  2. Learning rate $\alpha$

为方便计算和简便起见,采用一些较小的随机值。

For easy calculation and simplicity, take some small random values.

Step 2 − 当停止条件不成立时,继续执行步骤 3-11。

Step 2 − Continue step 3-11 when the stopping condition is not true.

Step 3 − 对于每个训练对,继续执行步骤 4-10。

Step 3 − Continue step 4-10 for every training pair.

Phase 1

Step 4 − 每个输入单元接收输入信号 xi 并将其发送给所有 i = 1 to n 的隐藏单元

Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for all i = 1 to n

Step 5 − 使用以下关系计算隐藏单元上的净输入 −

Step 5 − Calculate the net input at the hidden unit using the following relation −

Q_{inj}\:=\:b_{0j}\:+\:\sum_{i=1}^n x_{i}v_{ij}\:\:\:\:j\:=\:1\:to\:p

其中 b0j 是隐藏单元上的偏差, vij 是来自输入层的 i 单元的隐藏层 j 单元上的权重。

Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming from i unit of the input layer.

现在通过应用以下激活函数来计算净输出

Now calculate the net output by applying the following activation function

Q_{j}\:=\:f(Q_{inj})

将隐藏层单元的这些输出信号发送到输出层单元。

Send these output signals of the hidden layer units to the output layer units.

Step 6 − 使用以下关系计算输出层单元上的净输入 −

Step 6 − Calculate the net input at the output layer unit using the following relation −

y_{ink}\:=\:b_{0k}\:+\:\sum_{j = 1}^p\:Q_{j}\:w_{jk}\:\:k\:=\:1\:to\:m

其中 b0k 是输出单元上的偏差, wjk 是来自隐藏层的 j 单元的输出层 k 单元上的权重。

Here b0k ⁡is the bias on output unit, wjk is the weight on k unit of the output layer coming from j unit of the hidden layer.

通过应用以下激活函数计算净输出

Calculate the net output by applying the following activation function

y_{k}\:=\:f(y_{ink})

Phase 2

Step 7 − 根据在每个输出单元接收到的目标模式,计算误差校正项,如下所示 −

Step 7 − Compute the error correcting term, in correspondence with the target pattern received at each output unit, as follows −

\delta_{k}\:=\:(t_{k}\:-\:y_{k})f^{'}(y_{ink})

在此基础上,按如下所述更新权重和偏差 −

On this basis, update the weight and bias as follows −

\Delta v_{jk}\:=\:\alpha \delta_{k}\:Q_{ij}

\Delta b_{0k}\:=\:\alpha \delta_{k}

然后,将 $\delta_{k}$ 传回隐藏层。

Then, send $\delta_{k}$ back to the hidden layer.

Step 8 − 现在,每个隐藏单元将为其从输出单元的增量输入的总和。

Step 8 − Now each hidden unit will be the sum of its delta inputs from the output units.

\delta_{inj}\:=\:\displaystyle\sum\limits_{k=1}^m \delta_{k}\:w_{jk}

误差项可按如下方式计算 −

Error term can be calculated as follows −

\delta_{j}\:=\:\delta_{inj}f^{'}(Q_{inj})

在此基础上,按如下所述更新权重和偏差 −

On this basis, update the weight and bias as follows −

\Delta w_{ij}\:=\:\alpha\delta_{j}x_{i}

\Delta b_{0j}\:=\:\alpha\delta_{j}

Phase 3

Step 9 − 每个输出单元 (ykk = 1 to m) 按照如下方式更新权重和偏差 −

Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −

v_{jk}(new)\:=\:v_{jk}(old)\:+\:\Delta v_{jk}

b_{0k}(new)\:=\:b_{0k}(old)\:+\:\Delta b_{0k}

Step 10 − 每个输出单元 (zjj = 1 to p) 按照如下方式更新权重和偏差 −

Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −

w_{ij}(new)\:=\:w_{ij}(old)\:+\:\Delta w_{ij}

b_{0j}(new)\:=\:b_{0j}(old)\:+\:\Delta b_{0j}

Step 11 − 检查停止条件,该条件可能是达到时期数或目标输出与实际输出匹配。

Step 11 − Check for the stopping condition, which may be either the number of epochs reached or the target output matches the actual output.

Generalized Delta Learning Rule

Delta 规则仅适用于输出层。另一方面,广义 Delta 规则,也称为 back-propagation 规则,是一种创建隐藏层期望值的方法。

Delta rule works only for the output layer. On the other hand, generalized delta rule, also called as back-propagation rule, is a way of creating the desired values of the hidden layer.

Mathematical Formulation

对于激活函数 $y_{k}\:=\:f(y_{ink})$,隐藏层以及输出层净输入的导数可以给出为

For the activation function $y_{k}\:=\:f(y_{ink})$ the derivation of net input on Hidden layer as well as on output layer can be given by

y_{ink}\:=\:\displaystyle\sum\limits_i\:z_{i}w_{jk}

且 $\:\:y_{inj}\:=\:\sum_i x_{i}v_{ij}$

And $\:\:y_{inj}\:=\:\sum_i x_{i}v_{ij}$

现在要最小化的误差为

Now the error which has to be minimized is

E\:=\:\frac{1}{2}\displaystyle\sum\limits_{k}\:[t_{k}\:-\:y_{k}]^2

使用链式法则,我们得到

By using the chain rule, we have

\frac{\partial E}{\partial w_{jk}}\:=\:\frac{\partial }{\partial w_{jk}}(\frac{1}{2}\displaystyle\sum\limits_{k}\:[t_{k}\:-\:y_{k}]^2)

=\:\frac{\partial }{\partial w_{jk}}\lgroup\frac{1}{2}[t_{k}\:-\:t(y_{ink})]^2\rgroup

=\:-[t_{k}\:-\:y_{k}]\frac{\partial }{\partial w_{jk}}f(y_{ink})

=\:-[t_{k}\:-\:y_{k}]f(y_{ink})\frac{\partial }{\partial w_{jk}}(y_{ink})

=\:-[t_{k}\:-\:y_{k}]f^{'}(y_{ink})z_{j}

现在让我们称 $\delta_{k}\:=\:-[t_{k}\:-\:y_{k}]f^{'}(y_{ink})$

Now let us say $\delta_{k}\:=\:-[t_{k}\:-\:y_{k}]f^{'}(y_{ink})$

连接到隐藏单元 zj 上的权重可给出为 −

The weights on connections to the hidden unit zj can be given by −

\frac{\partial E}{\partial v_{ij}}\:=\:- \displaystyle\sum\limits_{k} \delta_{k}\frac{\partial }{\partial v_{ij}}\:(y_{ink})

代入 $y_{ink}$ 的值将得到以下结果

Putting the value of $y_{ink}$ we will get the following

\delta_{j}\:=\:-\displaystyle\sum\limits_{k}\delta_{k}w_{jk}f^{'}(z_{inj})

权重更新可以如下进行 −

Weight updating can be done as follows −

对于输出单元 −

For the output unit −

\Delta w_{jk}\:=\:-\alpha\frac{\partial E}{\partial w_{jk}}

对于隐藏单元 −

=\:\alpha\:\delta_{k}\:z_{j}

\Delta v_{ij}\:=\:-\alpha\frac{\partial E}{\partial v_{ij}}

For the hidden unit −

=\:\alpha\:\delta_{j}\:x_{i}

\Delta v_{ij}\:=\:-\alpha\frac{\partial E}{\partial v_{ij}}

这类神经网络基于模式关联,这意味着它们能够存储不同的模式,并且在给出输出时,它们可以通过与给定的输入模式进行匹配来生成一个存储的模式。这类内存也称为 Content-Addressable Memory (CAM)。联想式内存将存储的模式作为数据文件进行并行搜索。

=\:\alpha\:\delta_{j}\:x_{i}