如何在Python Pandas 中按时间间隔对数据进行分组？

先决条件：熊猫

当您遇到时间序列分析时，按时间间隔对数据进行分组是非常明显的。时间序列是按时间顺序索引（或列出或绘制）的一系列数据点。最常见的是，时间序列是在连续等间隔的时间点上取得的序列。

Pandas 提供了两个非常有用的函数，我们可以用它们来对数据进行分组。

resample()——这个函数主要用于时间序列数据。它是一种方便的时间序列变频和重采样方法。对象必须具有类似日期时间的索引（DatetimeIndex、PeriodIndex 或 TimedeltaIndex），或者将类似日期时间的值传递给 on 或 level 关键字。重采样根据实际数据生成唯一的采样分布。

Syntax : DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention=’start’, kind=None, loffset=None, limit=None, base=0, on=None, level=None)

Parameters :

rule : the offset string or object representing target conversion
axis : int, optional, default 0
closed : {‘right’, ‘left’}
label : {‘right’, ‘left’}
convention : For PeriodIndex only, controls whether to use the start or end of rule
loffset : Adjust the resampled time labels
base : For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.
on : For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.
level : For a MultiIndex, level (name or number) to use for resampling. Level must be datetime-like.

编程需要懂一点英语

例如：每月增加的数量，每年增加的总量。

Grouper — Grouper 允许用户指定用户想要分析数据的基础。

Syntax: dataframe.groupby(pd.Grouper(key, level, freq, axis, sort, label, convention, base, Ioffset, origin, offset))

Parameters:

key: selects the target column to be grouped
level: level of the target index
freq: groupby a specified frequency if a target column is a datetime-like object
axis: name or number of axis
sort: to enable sorting
label: interval boundary to be used for labeling, valid only when freq parameter is passed.
convention: If grouper is PeriodIndex and freq parameter is passed
base: works only when freq is passed
Ioffset: works only when freq is passed
origin: timestamp to adjust grouping on the basis of
offset: offset timedelta added to the origin

编程需要懂一点英语

方法

导入模块
加载或创建数据
根据需要重新采样数据
分组数据

下面给出了使用这种方法的实现：

使用中的数据框： timeseries.csv

链接：这里。

程序：使用重采样聚合

Python3

import numpy as np
import pandas as pd
  
# loading dataset
data = pd.read_csv('path of dataset')
  
# setting the index for the data
data = data.set_index(['created_at'])
  
# converting index to datetime index
data.index = pd.to_datetime(data.index)
  
# Changing start time for each hour, by default start time is at 0th minute
data.resample('W',  loffset='30Min30s').price.sum().head(2)
data.resample('W', loffset='30Min30s').price.sum().head(2)
  
# we can also aggregate it will show quantity added in each week
# as well as the total amount added in each week
data.resample('W', loffset='30Min30s').agg(
    {'price': 'sum', 'quantity': 'sum'}).head(5)

Python3

import numpy as np
  
import pandas as pd
  
# loading dataset
  
data = pd.read_csv(r'path of dataset')
# setting the index for the data
data = data.set_index(['created_at'])
# converting index to datetime index
data.index = pd.to_datetime(data.index)
  
# Changing start time for each hour, by default start time is at 0th minute
data.resample('W',  loffset='30Min30s').price.sum().head(2)
data.resample('W', loffset='30Min30s').price.sum().head(2)
  
data.groupby([pd.Grouper(freq='M'), 'store_type']).agg(total_quantity=('quantity', 'sum'),
                                                       total_amount=('price', 'sum')).head(5)

输出：

程序：根据不同的时间间隔对数据进行分组

在第一部分中，我们按照重采样的方式进行分组（根据天数、月数等），然后我们根据一个月内的商店类型对数据进行分组，然后像在重采样中所做的那样进行聚合，它将给出每周添加的数量以及每周添加的总量。

蟒蛇3

import numpy as np
  
import pandas as pd
  
# loading dataset
  
data = pd.read_csv(r'path of dataset')
# setting the index for the data
data = data.set_index(['created_at'])
# converting index to datetime index
data.index = pd.to_datetime(data.index)
  
# Changing start time for each hour, by default start time is at 0th minute
data.resample('W',  loffset='30Min30s').price.sum().head(2)
data.resample('W', loffset='30Min30s').price.sum().head(2)
  
data.groupby([pd.Grouper(freq='M'), 'store_type']).agg(total_quantity=('quantity', 'sum'),
                                                       total_amount=('price', 'sum')).head(5)

输出：