Time Series 简明教程

Time Series - Auto Regression

对于平稳时间序列,自回归模型将时间“t”处的变量值视为其之前“p”时间步的值的线性函数。数学上可以写成以下形式:

For a stationary time series, an auto regression models sees the value of a variable at time ‘t’ as a linear function of values ‘p’ time steps preceding it. Mathematically it can be written as −

y_ {t} = \:C+\:\phi_{1}y_{t-1}\:+\:\phi_{2}Y_{t-2}…​\phi_{p}y_{t-p}+\epsilon_{t}

y_{t} = \:C+\:\phi_{1}y_{t-1}\:+\:\phi_{2}Y_{t-2}...\phi_{p}y_{t-p}+\epsilon_{t}

我无法使用 Gemini 翻译任何内容。

其中,“p”是自回归趋势参数

Where,‘p’ is the auto-regressive trend parameter

\epsilon_ {t} 是白噪声,并且

$\epsilon_{t}$ is white noise, and

y_ {t-1},y_ {t-2} \:\: …y_ {t-p} 表示先前的时期变量的值。

$y_{t-1}, y_{t-2}\:\: …​y_{t-p}$ denote the value of variable at previous time periods.

可以使用多种方法校准 p 的值。找到“p”的适当值的一种方法是绘制自相关图。

The value of p can be calibrated using various methods. One way of finding the apt value of ‘p’ is plotting the auto-correlation plot.

Note - 在对数据执行任何分析之前,我们应该以 8:2 的可用总数据集比率将数据分割为训练和测试,因为测试数据只能找出我们模型的准确性,并且假设在作出预测之前我们无法获得该数据。对于时间序列,数据点的序列非常重要,因此在分割数据时应记住不要丢失顺序。

Note − We should separate the data into train and test at 8:2 ratio of total data available prior to doing any analysis on the data because test data is only to find out the accuracy of our model and assumption is, it is not available to us until after predictions have been made. In case of time series, sequence of data points is very essential so one should keep in mind not to lose the order during splitting of data.

自相关图或相关图显示变量与其自身在先前的时步关系。它使用 Pearson 相关并且显示 95% 置信区间内的相关。让我们看看我们数据的“温度”变量是怎样的。

An auto-correlation plot or a correlogram shows the relation of a variable with itself at prior time steps. It makes use of Pearson’s correlation and shows the correlations within 95% confidence interval. Let’s see how it looks like for ‘temperature’ variable of our data.

Showing ACP

In [141]:

split = len(df) - int(0.2*len(df))
train, test = df['T'][0:split], df['T'][split:]

In [142]:

from statsmodels.graphics.tsaplots import plot_acf

plot_acf(train, lags = 100)
plt.show()
code snippet9

假定所有位于蓝色阴影区域之外的滞后值具有相关性。

All the lag values lying outside the shaded blue region are assumed to have a csorrelation.