如何在Python Pandas 中按时间间隔对数据进行分组?
先决条件:熊猫
当您遇到时间序列分析时,按时间间隔对数据进行分组是非常明显的。时间序列是按时间顺序索引(或列出或绘制)的一系列数据点。最常见的是,时间序列是在连续等间隔的时间点上取得的序列。
Pandas 提供了两个非常有用的函数,我们可以用它们来对数据进行分组。
- resample()——这个函数主要用于时间序列数据。它是一种方便的时间序列变频和重采样方法。对象必须具有类似日期时间的索引(DatetimeIndex、PeriodIndex 或 TimedeltaIndex),或者将类似日期时间的值传递给 on 或 level 关键字。重采样根据实际数据生成唯一的采样分布。
Syntax : DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention=’start’, kind=None, loffset=None, limit=None, base=0, on=None, level=None)
Parameters :
- rule : the offset string or object representing target conversion
- axis : int, optional, default 0
- closed : {‘right’, ‘left’}
- label : {‘right’, ‘left’}
- convention : For PeriodIndex only, controls whether to use the start or end of rule
- loffset : Adjust the resampled time labels
- base : For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.
- on : For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.
- level : For a MultiIndex, level (name or number) to use for resampling. Level must be datetime-like.
例如:每月增加的数量,每年增加的总量。
- Grouper — Grouper 允许用户指定用户想要分析数据的基础。
Syntax: dataframe.groupby(pd.Grouper(key, level, freq, axis, sort, label, convention, base, Ioffset, origin, offset))
Parameters:
- key: selects the target column to be grouped
- level: level of the target index
- freq: groupby a specified frequency if a target column is a datetime-like object
- axis: name or number of axis
- sort: to enable sorting
- label: interval boundary to be used for labeling, valid only when freq parameter is passed.
- convention: If grouper is PeriodIndex and freq parameter is passed
- base: works only when freq is passed
- Ioffset: works only when freq is passed
- origin: timestamp to adjust grouping on the basis of
- offset: offset timedelta added to the origin
方法
- 导入模块
- 加载或创建数据
- 根据需要重新采样数据
- 分组数据
下面给出了使用这种方法的实现:
使用中的数据框: timeseries.csv
链接:这里。
程序:使用重采样聚合
Python3
import numpy as np
import pandas as pd
# loading dataset
data = pd.read_csv('path of dataset')
# setting the index for the data
data = data.set_index(['created_at'])
# converting index to datetime index
data.index = pd.to_datetime(data.index)
# Changing start time for each hour, by default start time is at 0th minute
data.resample('W', loffset='30Min30s').price.sum().head(2)
data.resample('W', loffset='30Min30s').price.sum().head(2)
# we can also aggregate it will show quantity added in each week
# as well as the total amount added in each week
data.resample('W', loffset='30Min30s').agg(
{'price': 'sum', 'quantity': 'sum'}).head(5)
Python3
import numpy as np
import pandas as pd
# loading dataset
data = pd.read_csv(r'path of dataset')
# setting the index for the data
data = data.set_index(['created_at'])
# converting index to datetime index
data.index = pd.to_datetime(data.index)
# Changing start time for each hour, by default start time is at 0th minute
data.resample('W', loffset='30Min30s').price.sum().head(2)
data.resample('W', loffset='30Min30s').price.sum().head(2)
data.groupby([pd.Grouper(freq='M'), 'store_type']).agg(total_quantity=('quantity', 'sum'),
total_amount=('price', 'sum')).head(5)
输出:
程序:根据不同的时间间隔对数据进行分组
在第一部分中,我们按照重采样的方式进行分组(根据天数、月数等),然后我们根据一个月内的商店类型对数据进行分组,然后像在重采样中所做的那样进行聚合,它将给出每周添加的数量以及每周添加的总量。
蟒蛇3
import numpy as np
import pandas as pd
# loading dataset
data = pd.read_csv(r'path of dataset')
# setting the index for the data
data = data.set_index(['created_at'])
# converting index to datetime index
data.index = pd.to_datetime(data.index)
# Changing start time for each hour, by default start time is at 0th minute
data.resample('W', loffset='30Min30s').price.sum().head(2)
data.resample('W', loffset='30Min30s').price.sum().head(2)
data.groupby([pd.Grouper(freq='M'), 'store_type']).agg(total_quantity=('quantity', 'sum'),
total_amount=('price', 'sum')).head(5)
输出: