Time Series 简明教程
Time Series - Naive Methods
Introduction
朴素方法(如假设时间“t”处的预测值是时间“t-1”处变量的实际值或序列的移动平均)用于衡量统计模型和机器学习模型的执行情况,并强调其必要性。
Naive Methods such as assuming the predicted value at time ‘t’ to be the actual value of the variable at time ‘t-1’ or rolling mean of series, are used to weigh how well do the statistical models and machine learning models can perform and emphasize their need.
让我们尝试在本教程中在时间序列数据的一个特性中使用这些模型。
In this chapter, let us try these models on one of the features of our time-series data.
首先,我们将查看我们数据的“温度”特性的平均值及其周围的偏差。了解最大值和最小值也很有用。我们可以在此使用 numpy 库的功能。
First we shall see the mean of the ‘temperature’ feature of our data and the deviation around it. It is also useful to see maximum and minimum temperature values. We can use the functionalities of numpy library here.
Showing statistics
In [135]:
import numpy
print (
'Mean: ',numpy.mean(df['T']), ';
Standard Deviation: ',numpy.std(df['T']),';
\nMaximum Temperature: ',max(df['T']),';
Minimum Temperature: ',min(df['T'])
)
我们有跨越等间隔时间线的所有 9357 项观察统计数据,我们可借此了解数据。
We have the statistics for all 9357 observations across equi-spaced timeline which are useful for us to understand the data.
现在,我们将尝试第一个朴素方法,设置当前时间点的预测值等于前一时间点的实际值,并计算均方根误差 (RMSE) 以量化此方法的性能。
Now we will try the first naive method, setting the predicted value at present time equal to actual value at previous time and calculate the root mean squared error(RMSE) for it to quantify the performance of this method.
Showing 1st naïve method
In [136]:
df['T']
df['T_t-1'] = df['T'].shift(1)
[137] 中:
In [137]:
df_naive = df[['T','T_t-1']][1:]
[138] 中:
In [138]:
from sklearn import metrics
from math import sqrt
true = df_naive['T']
prediction = df_naive['T_t-1']
error = sqrt(metrics.mean_squared_error(true,prediction))
print ('RMSE for Naive Method 1: ', error)
质朴方法 1 的 RMSE:12.901140576492974
RMSE for Naive Method 1: 12.901140576492974
让我们了解一下下一个质朴方法,其中将当前时间点的预测值等同于当前时间点之前的时间段的平均值。我们还将计算该方法的 RMSE。
Let us see the next naive method, where predicted value at present time is equated to the mean of the time periods preceding it. We will calculate the RMSE for this method too.
Showing 2nd naive method
[139] 中:
In [139]:
df['T_rm'] = df['T'].rolling(3).mean().shift(1)
df_naive = df[['T','T_rm']].dropna()
[140] 中:
In [140]:
true = df_naive['T']
prediction = df_naive['T_rm']
error = sqrt(metrics.mean_squared_error(true,prediction))
print ('RMSE for Naive Method 2: ', error)
RMSE for Naive Method 2: 14.957633272839242
RMSE for Naive Method 2: 14.957633272839242
在此处,您还可以尝试各种前面时间段(也称为“滞后”)的数量,您想要考虑这些数量,此处保留为 3。在该数据中,您可以看到随着滞后数的增加,误差也会增加。如果滞后保持为 1,它将成为与之前使用的质朴方法相同。
Here, you can experiment with various number of previous time periods also called ‘lags’ you want to consider, which is kept as 3 here. In this data it can be seen that as you increase the number of lags and error increases. If lag is kept 1, it becomes same as the naïve method used earlier.
Points to Note
Points to Note
-
You can write a very simple function for calculating root mean squared error. Here, we have used the mean squared error function from the package ‘sklearn’ and then taken its square root.
-
In pandas df[‘column_name’] can also be written as df.column_name, however for this dataset df.T will not work the same as df[‘T’] because df.T is the function for transposing a dataframe. So use only df[‘T’] or consider renaming this column before using the other syntax.