Python|熊猫 dataframe.resample()
Python是一种用于进行数据分析的出色语言,主要是因为以数据为中心的Python包的奇妙生态系统。 Pandas就是其中之一,它使导入和分析数据变得更加容易。
Pandas dataframe.resample()
函数主要用于时间序列数据。
时间序列是按时间顺序索引(或列出或绘制)的一系列数据点。最常见的是,时间序列是在连续的等间隔时间点采取的序列。它是一种对时间序列进行频率转换和重采样的便捷方法。对象必须具有类似日期时间的索引(DatetimeIndex、PeriodIndex 或 TimedeltaIndex),或者将类似日期时间的值传递给 on 或 level 关键字。
Syntax : DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention=’start’, kind=None, loffset=None, limit=None, base=0, on=None, level=None)
Parameters :
rule : the offset string or object representing target conversion
axis : int, optional, default 0
closed : {‘right’, ‘left’}
label : {‘right’, ‘left’}
convention : For PeriodIndex only, controls whether to use the start or end of rule
loffset : Adjust the resampled time labels
base : For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.
on : For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.
level : For a MultiIndex, level (name or number) to use for resampling. Level must be datetime-like.
重采样根据实际数据生成唯一的采样分布。我们可以应用各种频率来重新采样我们的时间序列数据。这是分析领域中一项非常重要的技术。
最常用的时间序列频率是——
W :每周频率
M :月末频率
SM :半月结束频率(15 日和月末)
Q :四分之一结束频率
还有许多其他类型的时间序列频率可用。让我们看看如何将这些时间序列频率应用于数据并重新采样。
有关代码中使用的 CSV 文件的链接,请单击此处
这是苹果公司从 (13-11-17) 到 (13-11-18) 为期 1 年的股价数据
示例 #1:按月频率重新采样数据
# importing pandas as pd
import pandas as pd
# By default the "date" column was in string format,
# we need to convert it into date-time format
# parse_dates =["date"], converts the "date"
# column to date-time format. We know that
# resampling works with time-series data only
# so convert "date" column to index
# index_col ="date", makes "date" column, the index of the data frame
df = pd.read_csv("apple.csv", parse_dates =["date"], index_col ="date")
# Printing the first 10 rows of dataframe
df[:10]
# Resampling the time series data based on months
# we apply it on stock close price
# 'M' indicates month
monthly_resampled_data = df.close.resample('M').mean()
# the above command will find the mean closing price
# of each month for a duration of 12 months.
monthly_resampled_data
输出 :
示例 #2:按周频率重新采样数据
# importing pandas as pd
import pandas as pd
# We know that resampling works with time-series data
# only so convert "date" column to index
# index_col ="date", makes "date" column.
df = pd.read_csv("apple.csv", parse_dates =["date"], index_col ="date")
# Resampling the time series data based on weekly frequency
# we apply it on stock open price 'W' indicates week
weekly_resampled_data = df.open.resample('W').mean()
# find the mean opening price of each week
# for each week over a period of 1 year.
weekly_resampled_data
输出 :
示例 #3:按季度频率重新采样数据
# importing pandas as pd
import pandas as pd
# We know that resampling works with time-series
# data only so convert our "date" column to index
# index_col ="date", makes "date" column
df = pd.read_csv("apple.csv", parse_dates =["date"], index_col ="date")
# Resampling the time series data
# based on Quarterly frequency
# 'Q' indicates quarter
Quarterly_resampled_data = df.open.resample('Q').mean()
# mean opening price of each quarter
# over a period of 1 year.
Quarterly_resampled_data
输出 :