如何在Python中计算自相关？

相关性通常决定了两个变量之间的关系。相关性是在之前的时间步计算变量与其自身之间的相关性，这种相关性称为自相关。

方法 1：使用 lagplot()

此示例使用每日最低温度数据集。第一步，可以使用pandas提供的lagplot()函数快速检查自相关。

语法：

pd.plotting.lag_plot(data, lag=1)

在哪里，

数据是输入数据框
lag 指定整数以获取滞后

使用的数据： blr 中的每日最低温度

Python3

# import modules
import pandas as pd
 
# read the data from the csv
data = pd.read_csv("daily-minimum-temperatures-in-blr.csv",
                   header=0, index_col=0, parse_dates=True,
                   squeeze=True)
 
# display top 15 data
data.head(15)
 
# lagplot
pd.plotting.lag_plot(data, lag=1)

Python3

data = pd.read_csv("daily-minimum-temperatures-in-blr.csv",
                   header=0, index_col=0, parse_dates=True,
                   squeeze=True)
 
# extracting only the temperature values
values = pd.DataFrame(data.values)
 
# using shift function to shift the values.
dataframe = pd.concat([values.shift(3), values.shift(2),
                       values.shift(1), values], axis=1)
# naming the columns
dataframe.columns = ['t', 't+1', 't+2', 't+3']
 
# using corr() function to compute the correlation
result = dataframe.corr()
 
print(result)

Python3

# import the required modules
import pandas as pd
from statsmodels.graphics.tsaplots import plot_acf
 
# read the csv data
data = pd.read_csv("daily-minimum-temperatures-in-blr.csv",
                   header=0, index_col=0, parse_dates=True,
                   squeeze=True)
 
# plot the auto correlation
plot_acf(data)

输出：

方法2：在不同时间步创建滞后变量

我们知道，当前和先前时间步的观测值对于预测未来步很重要，让我们在不同的时间步创建滞后变量，例如 t+1、t+2、t+3。这是使用pandas.concat()和shift()函数完成的。 Shift函数将时间步长移动一个指定值， Concat函数将不同时间步长的滞后变量连接起来，如下所示。

在新数据帧上使用pandas.corr()函数来计算相关矩阵。

句法：

pandas.DataFrame.corr(method = 'pearson')

其中，方法 - 用于计算标准相关系数的pearson

例子：

Python3

data = pd.read_csv("daily-minimum-temperatures-in-blr.csv",
                   header=0, index_col=0, parse_dates=True,
                   squeeze=True)
 
# extracting only the temperature values
values = pd.DataFrame(data.values)
 
# using shift function to shift the values.
dataframe = pd.concat([values.shift(3), values.shift(2),
                       values.shift(1), values], axis=1)
# naming the columns
dataframe.columns = ['t', 't+1', 't+2', 't+3']
 
# using corr() function to compute the correlation
result = dataframe.corr()
 
print(result)

输出：

方法 3：使用 plot_acf()

通过滞后时间序列的自相关图称为自相关函数(ACF)。这样的图也称为相关图。相关图绘制了所有可能的时间步长的相关性。可以考虑相关性最高的滞后变量进行建模。下面是计算和绘制所选数据集的自相关图的示例。

Statsmodel 库为此提供了一个名为plot_acf()的函数。

句法：

statsmodels.graphics.tsaplots.plot_acf(x,lags,alpha)

在哪里，

x - 时间序列值数组
lags - 用于水平轴的 int 或滞后值数组。当 lags 是 int 时使用 np.arange(lags)。
alpha – 如果给定一个数字，则返回给定水平的置信区间。例如，如果 alpha=.05，则返回 95 % 的置信区间，其中标准差是根据 Bartlett 公式计算的。如果没有，则不绘制置信区间。

示例：

Python3

# import the required modules
import pandas as pd
from statsmodels.graphics.tsaplots import plot_acf
 
# read the csv data
data = pd.read_csv("daily-minimum-temperatures-in-blr.csv",
                   header=0, index_col=0, parse_dates=True,
                   squeeze=True)
 
# plot the auto correlation
plot_acf(data)

输出：