Time Series 简明教程
Time Series - ARIMA
我们已经了解到,对于平稳时间序列,时间“t”处的变量是先前观测或残差误差的线性函数。因此,现在是时候将这两者结合起来,建立自回归移动平均 (ARMA) 模型了。
We have already understood that for a stationary time series a variable at time ‘t’ is a linear function of prior observations or residual errors. Hence it is time for us to combine the two and have an Auto-regressive moving average (ARMA) model.
然而,有时时间序列不是平稳的,即序列的统计特性(如均值、方差)随时间变化。而我们迄今为止学习过的统计模型假设时间序列是平稳的,因此,我们可以包括差分时间序列的预处理步骤,使其平稳。现在,对于我们正在处理的时间序列是否是平稳的,我们必须找出答案。
However, at times the time series is not stationary, i.e the statistical properties of a series like mean, variance changes over time. And the statistical models we have studied so far assume the time series to be stationary, therefore, we can include a pre-processing step of differencing the time series to make it stationary. Now, it is important for us to find out whether the time series we are dealing with is stationary or not.
用于查找时间序列平稳性的各种方法正在寻找时间序列图中的季节性或趋势,检查不同时间段的均值和方差差异、增强型迪基-福勒 (ADF) 检验、KPSS 检验、赫斯特指数等。
Various methods to find the stationarity of a time series are looking for seasonality or trend in the plot of time series, checking the difference in mean and variance for various time periods, Augmented Dickey-Fuller (ADF) test, KPSS test, Hurst’s exponent etc.
让我们使用 ADF 检验来确定数据集中的“温度”变量是否是平稳的时间序列。
Let us see whether the ‘temperature’ variable of our dataset is a stationary time series or not using ADF test.
In [74]:
from statsmodels.tsa.stattools import adfuller
result = adfuller(train)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value In result[4].items()
print('\t%s: %.3f' % (key, value))
ADF 统计量:-10.406056
ADF Statistic: -10.406056
p 值:0.000000
p-value: 0.000000
临界值:
Critical Values:
1%:-3.431
1%: -3.431
5%:-2.862
5%: -2.862
10%:-2.567
10%: -2.567
现在我们已经运行了 ADF 检验,让我们解释一下结果。首先,我们将 ADF 统计量与临界值进行比较,较低的临界值告诉我们该序列很可能是非平稳的。接下来,我们查看 p 值。大于 0.05 的 p 值也表明时间序列是非平稳的。
Now that we have run the ADF test, let us interpret the result. First we will compare the ADF Statistic with the critical values, a lower critical value tells us the series is most likely non-stationary. Next, we see the p-value. A p-value greater than 0.05 also suggests that the time series is non-stationary.
或者,p 值小于或等于 0.05,或 ADF 统计量小于临界值表明时间序列是平稳的。
Alternatively, p-value less than or equal to 0.05, or ADF Statistic less than critical values suggest the time series is stationary.
因此,我们正在处理的时间序列已经平稳。在平稳时间序列的情况下,我们将“d”参数设置为 0。
Hence, the time series we are dealing with is already stationary. In case of stationary time series, we set the ‘d’ parameter as 0.
我们还可以使用赫斯特指数确认时间序列的平稳性。
We can also confirm the stationarity of time series using Hurst exponent.
In [75]:
import hurst
H, c,data = hurst.compute_Hc(train)
print("H = {:.4f}, c = {:.4f}".format(H,c))
H = 0.1660,c = 5.0740
H = 0.1660, c = 5.0740
H<0.5 的值表示反持续性行为,H>0.5 表示持续性行为或趋势序列。H=0.5 表示随机游走/布朗运动。H<0.5 的值,确认我们的序列是平稳的。
The value of H<0.5 shows anti-persistent behavior, and H>0.5 shows persistent behavior or a trending series. H=0.5 shows random walk/Brownian motion. The value of H<0.5, confirming that our series is stationary.
对于非平稳时间序列,我们将“d”参数设置为 1。此外,自回归趋势参数“p”和移动平均趋势参数“q”的值是根据平稳时间序列计算的,即通过在对时间序列求差后绘制 ACP 和 PACP 来计算的。
For non-stationary time series, we set ‘d’ parameter as 1. Also, the value of the auto-regressive trend parameter ‘p’ and the moving average trend parameter ‘q’, is calculated on the stationary time series i.e by plotting ACP and PACP after differencing the time series.
ARIMA 模型的特点在于 3 个参数 (p,d,q),现在我们已经了解了它,因此让我们对我们的时间序列建模并预测温度的未来值。
ARIMA Model, which is characterized by 3 parameter, (p,d,q) are now clear to us, so let us model our time series and predict the future values of temperature.
In [156]:
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(train.values, order=(5, 0, 2))
model_fit = model.fit(disp=False)
In [157]:
predictions = model_fit.predict(len(test))
test_ = pandas.DataFrame(test)
test_['predictions'] = predictions[0:1871]
In [158]:
plt.plot(df['T'])
plt.plot(test_.predictions)
plt.show()
In [167]:
error = sqrt(metrics.mean_squared_error(test.values,predictions[0:1871]))
print ('Test RMSE for ARIMA: ', error)
ARIMA 的测试 RMSE:43.21252940234892
Test RMSE for ARIMA: 43.21252940234892