Theano 简明教程
Theano - Trivial Training Example
Theano 在训练神经网络方面非常有用,在训练神经网络时,我们必须反复计算成本和梯度以实现最优。在大数据集上,这在计算上会很密集。由于 Theano 对我们前面看到的计算图进行了内部优化,因此它可以有效地执行此操作。
Theano is quite useful in training neural networks where we have to repeatedly calculate cost, and gradients to achieve an optimum. On large datasets, this becomes computationally intensive. Theano does this efficiently due to its internal optimizations of the computational graph that we have seen earlier.
Problem Statement
接下来,我们将学习如何使用 Theano 库训练网络。我们将采用一个简单的案例,从一个四特征数据集开始。我们在对每个特征应用一定权重(重要性)后计算这些特征的总和。
We shall now learn how to use Theano library to train a network. We will take a simple case where we start with a four feature dataset. We compute the sum of these features after applying a certain weight (importance) to each feature.
训练的目标是修改分配给每个特征的权重,以便总和达到目标值 100。
The goal of the training is to modify the weights assigned to each feature so that the sum reaches a target value of 100.
sum = f1 * w1 + f2 * w2 + f3 * w3 + f4 * w4
其中 f1 、 f2 、……是特征值, w1 、 w2 、……是权重。
Where f1, f2, … are the feature values and w1, w2, … are the weights.
让我对示例进行量化以更好地理解问题陈述。我们将为每个特征假定一个初始值 1.0,我们将 w1 等于 0.1 、 w2 等于 0.25 、 w3 等于 0.15 , w4 等于 0.3 。分配权重值的逻辑并不确定,这只是我们的直觉。因此,初始总和如下 −
Let me quantize the example for a better understanding of the problem statement. We will assume an initial value of 1.0 for each feature and we will take w1 equals 0.1, w2 equals 0.25, w3 equals 0.15, and w4 equals 0.3. There is no definite logic in assigning the weight values, it is just our intuition. Thus, the initial sum is as follows −
sum = 1.0 * 0.1 + 1.0 * 0.25 + 1.0 * 0.15 + 1.0 * 0.3
总和为 0.8 。现在,我们将继续修改权重分配,以便此总和接近 100。 0.8 的当前结果值远未达到我们期望的目标值 100。在机器学习术语中,我们将 cost 定义为目标值与当前输出值之间的差,通常平方以增大误差。我们通过计算梯度和更新权重向量在每次迭代中减少此成本。
Which sums to 0.8. Now, we will keep modifying the weight assignment so that this sum approaches 100. The current resultant value of 0.8 is far away from our desired target value of 100. In Machine Learning terms, we define cost as the difference between the target value minus the current output value, typically squared to blow up the error. We reduce this cost in each iteration by calculating the gradients and updating our weights vector.
让我们看看如何在 Theano 中实现这个整个逻辑。
Let us see how this entire logic is implemented in Theano.
Declaring Variables
我们首先声明我们的输入向量 x 如下 −
We first declare our input vector x as follows −
x = tensor.fvector('x')
其中 x 是浮点值的单维数组。
Where x is a single dimensional array of float values.
我们定义标量 target 变量,如下所示 −
We define a scalar target variable as given below −
target = tensor.fscalar('target')
接下来,我们使用上面讨论的初始值创建一个权重张量 W −
Next, we create a weights tensor W with the initial values as discussed above −
W = theano.shared(numpy.asarray([0.1, 0.25, 0.15, 0.3]), 'W')
Defining Theano Expression
我们现在使用以下表达式计算输出 −
We now calculate the output using the following expression −
y = (x * W).sum()
注意在上述声明中, x 和 W 是矢量,而不是简单的标量变量。我们现在使用以下表达式计算误差(成本):
Note that in the above statement x and W are the vectors and not simple scalar variables. We now calculate the error (cost) with the following expression −
cost = tensor.sqr(target - y)
成本是目标值与当前输出值之间的差值的平方。
The cost is the difference between the target value and the current output, squared.
为了计算梯度,它告诉我们距离目标有多远,我们使用内置 grad 方法,如下所示:
To calculate the gradient which tells us how far we are from the target, we use the built-in grad method as follows −
gradients = tensor.grad(cost, [W])
我们现在通过取 0.1 的学习率来更新 weights 矢量,如下所示:
We now update the weights vector by taking a learning rate of 0.1 as follows −
W_updated = W - (0.1 * gradients[0])
接下来,我们需要使用以上的值更新权重矢量。我们在此声明中执行此操作:
Next, we need to update our weights vector using the above values. We do this in the following statement −
updates = [(W, W_updated)]
Defining/Invoking Theano Function
最后,我们在Theano中定义 function 来计算总和。
Lastly, we define a function in Theano to compute the sum.
f = function([x, target], y, updates=updates)
为了调用上述函数一定次数,我们创建 for 循环,如下所示:
To invoke the above function a certain number of times, we create a for loop as follows −
for i in range(10):
output = f([1.0, 1.0, 1.0, 1.0], 100.0)
正如前面所述,该函数的输入是包含四种特征的初始值的一个矢量 - 我们将 1.0 的值分配给每个特征,没有任何特定原因。您可以分配您选择的不同值,并检查函数是否最终收敛。我们将在每次迭代中打印权重矢量的值和相应的输出。它显示在下面的代码中:
As said earlier, the input to the function is a vector containing the initial values for the four features - we assign the value of 1.0 to each feature without any specific reason. You may assign different values of your choice and check if the function ultimately converges. We will print the values of the weight vector and the corresponding output in each iteration. It is shown in the below code −
print ("iteration: ", i)
print ("Modified Weights: ", W.get_value())
print ("Output: ", output)
Full Program Listing
对于您的快速参考,这里再现了完整的程序清单:
The complete program listing is reproduced here for your quick reference −
from theano import *
import numpy
x = tensor.fvector('x')
target = tensor.fscalar('target')
W = theano.shared(numpy.asarray([0.1, 0.25, 0.15, 0.3]), 'W')
print ("Weights: ", W.get_value())
y = (x * W).sum()
cost = tensor.sqr(target - y)
gradients = tensor.grad(cost, [W])
W_updated = W - (0.1 * gradients[0])
updates = [(W, W_updated)]
f = function([x, target], y, updates=updates)
for i in range(10):
output = f([1.0, 1.0, 1.0, 1.0], 100.0)
print ("iteration: ", i)
print ("Modified Weights: ", W.get_value())
print ("Output: ", output)
当您运行该程序时,您将看到以下输出:
When you run the program you will see the following output −
Weights: [0.1 0.25 0.15 0.3 ]
iteration: 0
Modified Weights: [19.94 20.09 19.99 20.14]
Output: 0.8
iteration: 1
Modified Weights: [23.908 24.058 23.958 24.108]
Output: 80.16000000000001
iteration: 2
Modified Weights: [24.7016 24.8516 24.7516 24.9016]
Output: 96.03200000000001
iteration: 3
Modified Weights: [24.86032 25.01032 24.91032 25.06032]
Output: 99.2064
iteration: 4
Modified Weights: [24.892064 25.042064 24.942064 25.092064]
Output: 99.84128
iteration: 5
Modified Weights: [24.8984128 25.0484128 24.9484128 25.0984128]
Output: 99.968256
iteration: 6
Modified Weights: [24.89968256 25.04968256 24.94968256 25.09968256]
Output: 99.9936512
iteration: 7
Modified Weights: [24.89993651 25.04993651 24.94993651 25.09993651]
Output: 99.99873024
iteration: 8
Modified Weights: [24.8999873 25.0499873 24.9499873 25.0999873]
Output: 99.99974604799999
iteration: 9
Modified Weights: [24.89999746 25.04999746 24.94999746 25.09999746]
Output: 99.99994920960002
观察到经过四次迭代后,结果是 99.96 ,而经过五次迭代后,结果是 99.99 ,接近我们的目标 100.0 。
Observe that after four iterations, the output is 99.96 and after five iterations, it is 99.99, which is close to our desired target of 100.0.
根据所需的准确度,您可以安全地得出结论,即该网络在4到5次迭代中经过训练。训练完成后,查找权重矢量,经过5次迭代后,其取以下值:
Depending on the desired accuracy, you may safely conclude that the network is trained in 4 to 5 iterations. After the training completes, look up the weights vector, which after 5 iterations takes the following values −
iteration: 5
Modified Weights: [24.8984128 25.0484128 24.9484128 25.0984128]
你现在可以在你的网络中使用这些值,用于部署该模型。
You may now use these values in your network for deploying the model.