Python时间序列(1) - 芒果文档

📌 相关文章

📜 Python时间序列(1)

📅 最后修改于: 2023-12-03 14:46:45.017000 🧑 作者: Mango

Python时间序列介绍

Python是一门功能强大的编程语言，具有丰富的时间序列分析和处理功能。在本篇介绍中，我们将深入探讨Python中用于处理时间序列的工具和技术。

时间日期数据类型

在Python中，日期和时间是可表示的对象。Python内置了datetime模块，它提供了一些类来处理日期和时间数据类型。

import datetime
today = datetime.date.today() # 获取当前日期
print(today)

输出：

2022-01-01

datetime.date()获取当前日期，返回一个datetime.date()对象。

datetime.datetime()也可以用来表示日期和时间，它可以提供更准确的时间，包括时、分、秒和微秒。使用strptime()方法可以将字符串转换为datetime.datetime()对象。

import datetime
str_date = '2022-01-01 10:30:00'
new_date = datetime.datetime.strptime(str_date, '%Y-%m-%d %H:%M:%S')
print(new_date)

输出：

2022-01-01 10:30:00

时间序列基础

时间序列是一组按时间排序的数据，通常由时间戳和相应的值组成。Python提供了许多工具来创建、操作和可视化时间序列数据。

使用pandas库可以轻松地创建、读取和操作时间序列数据。以下是一个简单的例子：

import pandas as pd
import numpy as np

# 创建时间序列
dates = pd.date_range('20220101', periods=7)
ts = pd.Series(np.random.randn(len(dates)), index=dates)
print(ts)

输出：

2022-01-01    1.117401
2022-01-02    0.048476
2022-01-03   -0.929202
2022-01-04   -1.020167
2022-01-05    0.678905
2022-01-06   -1.682856
2022-01-07    0.380923
Freq: D, dtype: float64

使用date_range()函数创建7个时间戳，并在这7个时间戳上生成随机值以创建时间序列。使用pd.Series()函数将时间序列转换为数组形式。

时间序列的索引和选择

pandas库使用时间戳作为索引，可以轻松地选择和过滤时间序列数据。

import pandas as pd
import numpy as np

# 创建时间序列
dates = pd.date_range('20220101', periods=7)
ts = pd.Series(np.random.randn(len(dates)), index=dates)

# 选择指定日期的数据
print(ts['2022-01-02'])

# 选择指定日期范围的数据
print(ts['2022-01-02':'2022-01-06'])

输出：

0.6121526815908535
2022-01-02    0.612153
2022-01-03   -0.159406
2022-01-04    0.662330
2022-01-05    0.321166
2022-01-06   -1.068791
Freq: D, dtype: float64

使用时间戳作为索引可以轻松地选择特定日期或日期范围的数据。

时间序列数据重采样

重采样是指将时间序列数据从一个频率转换为另一个频率的过程。例如，从日频率的时间序列转换为月频率的时间序列。

import pandas as pd
import numpy as np

# 创建时间序列
rng = pd.date_range('20220101', periods=12, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)

# 将数据重采样为月度频率
ts_resample = ts.resample('M').sum()
print(ts_resample)

输出：

2022-01-31    1.759051
2022-02-28    1.127972
2022-03-31    1.313488
Freq: M, dtype: float64

在上面的例子中，我们使用resample()函数将时间序列数据重采样为月度频率，并对重采样后的数据进行求和操作。

时间序列数据移动

移动操作是指将时间序列数据整体向前或向后移动。例如，将整个时间序列向后移动10天。

import pandas as pd
import numpy as np

# 创建时间序列
rng = pd.date_range('20220101', periods=7, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)

# 进行移动操作
ts_shift = ts.shift(2)
print(ts)
print(ts_shift)

输出：

2022-01-01    0.693592
2022-01-02    0.110929
2022-01-03    0.830291
2022-01-04    0.891311
2022-01-05    0.732596
2022-01-06    0.441619
2022-01-07   -0.799911
Freq: D, dtype: float64
2022-01-01         NaN
2022-01-02         NaN
2022-01-03    0.693592
2022-01-04    0.110929
2022-01-05    0.830291
2022-01-06    0.891311
2022-01-07    0.732596
Freq: D, dtype: float64

在上面的例子中，我们使用shift()函数对时间序列数据进行移动操作，将数据整体向后移动2天。

时间序列数据可视化

可以使用matplotlib库来可视化时间序列数据。以下是一个简单的例子：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# 创建时间序列并可视化
dates = pd.date_range('20220101', periods=7)
ts = pd.Series(np.random.randn(len(dates)), index=dates)
ts.plot()
plt.show()

输出：

时间序列可视化

在上面的例子中，我们使用matplotlib库对时间序列数据进行可视化操作，可以看到数据的变化趋势。

总结

Python提供了丰富的工具和技术来处理和分析时间序列数据。使用datetime模块可以创建和处理日期和时间数据类型。使用pandas库可以轻松地创建、读取和操作时间序列数据，并可以对数据进行重采样、移动和可视化操作。因此，熟练掌握时间序列数据处理技术对Python程序员来说是至关重要的。