Artificial Neural Network 简明教程

Other Optimization Techniques

Iterated Gradient Descent Technique

梯度下降,也称为最速下降,是一种迭代优化算法,用于查找函数的局部最小值。在最小化该函数时,我们关心的是要最小化的代价或误差(请记住旅行商问题)。它广泛用于深度学习中,在各种情况下都很有用。这里要记住的一点是我们关心的是局部优化,而不是全局优化。

Gradient descent, also known as the steepest descent, is an iterative optimization algorithm to find a local minimum of a function. While minimizing the function, we are concerned with the cost or error to be minimized (Remember Travelling Salesman Problem). It is extensively used in deep learning, which is useful in a wide variety of situations. The point here to be remembered is that we are concerned with local optimization and not global optimization.

Main Working Idea

我们可以在以下步骤的帮助下理解梯度下降的主要工作思路 −

We can understand the main working idea of gradient descent with the help of the following steps −

  1. First, start with an initial guess of the solution.

  2. Then, take the gradient of the function at that point.

  3. Later, repeat the process by stepping the solution in the negative direction of the gradient.

通过执行上述步骤,该算法最终将在梯度为零时收敛。

By following the above steps, the algorithm will eventually converge where the gradient is zero.

optimization

Mathematical Concept

假设我们有一个函数 f(x) ,我们正在尝试找到此函数的最小值。以下是查找 f(x) 的最小值步骤。

Suppose we have a function f(x) and we are trying to find the minimum of this function. Following are the steps to find the minimum of f(x).

  1. First, give some initial value $x_{0}\:for\:x$

  2. Now take the gradient $\nabla f$ ⁡of function, with the intuition that the gradient will give the slope of the curve at that x and its direction will point to the increase in the function, to find out the best direction to minimize it.

  3. Now change x as follows − x_{n\:+\:1}\:=\:x_{n}\:-\:\theta \nabla f(x_{n})

此处, θ > 0 是训练速率(步长),它迫使算法进行小的跳转。

Here, θ > 0 is the training rate (step size) that forces the algorithm to take small jumps.

Estimating Step Size

实际上,错误的步长 θ 可能达不到收敛,因此仔细选择步长非常重要。在选择步长时必须记住以下几点:

Actually a wrong step size θ may not reach convergence, hence a careful selection of the same is very important. Following points must have to be remembered while choosing the step size

  1. Do not choose too large step size, otherwise it will have a negative impact, i.e. it will diverge rather than converge.

  2. Do not choose too small step size, otherwise it take a lot of time to converge.

关于选择步长的一些选项 -

Some options with regards to choosing the step size −

  1. One option is to choose a fixed step size.

  2. Another option is to choose a different step size for every iteration.

Simulated Annealing

模拟退火 (SA)的基本概念源自固体的退火过程。在退火过程中,如果我们加热金属使其超过熔点,然后冷却它,则结构特性将取决于冷却速率。我们还可以说 SA 模拟了退火的冶金过程。

The basic concept of Simulated Annealing (SA) is motivated by the annealing in solids. In the process of annealing, if we heat a metal above its melting point and cool it down then the structural properties will depend upon the rate of cooling. We can also say that SA simulates the metallurgy process of annealing.

Use in ANN

SA 是一种随机计算方法,受退火类比启发,用于逼近给定函数的全局优化。我们可以使用 SA 来训练前馈神经网络。

SA is a stochastic computational method, inspired by Annealing analogy, for approximating the global optimization of a given function. We can use SA to train feed-forward neural networks.

Algorithm

Step 1 - 生成一个随机解决方案。

Step 1 − Generate a random solution.

Step 2 - 使用一些成本函数计算其成本。

Step 2 − Calculate its cost using some cost function.

Step 3 - 生成一个随机邻域解决方案。

Step 3 − Generate a random neighboring solution.

Step 4 - 通过相同的成本函数计算新的解决方案成本。

Step 4 − Calculate the new solution cost by the same cost function.

Step 5 - 如下比较新解决方案的成本和旧解决方案的成本 -

Step 5 − Compare the cost of a new solution with that of an old solution as follows −

如果 CostNew Solution < CostOld Solution ,则移动到新解决方案。

If CostNew Solution < CostOld Solution then move to the new solution.

Step 6 - 测试停止条件,可能是达到最大迭代次数或获得可接受的解决方案。

Step 6 − Test for the stopping condition, which may be the maximum number of iterations reached or get an acceptable solution.