如何计算 Pandas Groupby 对象中的唯一值?
先决条件:熊猫
Groupby 顾名思义,就是根据某个值的相似性对属性进行分组。我们可以使用 groupby()、agg() 和 reset_index() 方法计算 pandas Groupby 对象中的唯一值。本文描述了如何使用 Pandas 检索数据框中某些属性的唯一值计数。
使用的功能
- groupby() – groupby()函数用于根据某些条件将数据分组。 pandas 对象可以在它们的任何轴上拆分。
Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)
Parameters :
- by : mapping, function, str, or iterable
- axis : int, default 0
- level : If the axis is a MultiIndex (hierarchical), group by a particular level or levels
- as_index : For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
- sort : Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.
- group_keys : When calling apply, add group keys to index to identify pieces
- squeeze : Reduce the dimensionality of the return type if possible, otherwise return a consistent type
Returns : GroupBy object
- agg() – agg() 用于传递一个函数或函数列表,以分别应用于系列甚至系列的每个元素。在函数列表的情况下, agg() 方法返回多个结果。
Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)
Parameters:
- func : callable, string, dictionary, or list of string/callables. Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. For a DataFrame, can pass a dict, if the keys are DataFrame column names.
- axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’: apply function to each column. 1 or ‘columns’: apply function to each row.
Returns: Aggregated DataFrame
- reset-index() – Pandas reset_index() 是一种重置数据帧索引的方法。 reset_index() 方法将范围从 0 到数据长度的整数列表设置为索引。
Syntax: DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=”)
Parameters:
- level: int, string or a list to select and remove passed column from index.
- drop: Boolean value, Adds the replaced index column to the data if False.
- inplace: Boolean value, make changes in the original data frame itself if True.
- col_level: Select in which column level to insert the labels.
- col_fill: Object, to determine how the other levels are named.
Return type: DataFrame
方法:
- 导入库
- 制作数据
- 组数据
- 使用聚合函数
- 重置索引
- 打印数据
示例 1:
Python
# import pandas
import pandas as pd
# create dataframe
df = pd.DataFrame({'Col_1': ['a', 'b', 'c', 'b', 'a', 'd'],
'Col_2': [1, 2, 3, 3, 2, 1]})
# print original dataframe
print("original dataframe:")
display(df)
# call groupby method.
df = df.groupby("Col_1")
# call agg method
df = df.agg({"Col_2": "nunique"})
# call reset_index method
df = df.reset_index()
# print dataframe
print("final dataframe:")
display(df)
Python
# import pandas
import pandas as pd
# create dataframe
df = pd.DataFrame({'Col_1': ['a', 'b', 'c', 'b', 'a', 'd'],
'Col_2': [1, 2, 3, 3, 2, 1]})
# print original dataframe
print("original dataframe:")
display(df)
# call groupby method.
df = df.groupby("Col_2")
# call agg method
df = df.agg({"Col_1": "nunique"})
# call reset_index method
df = df.reset_index()
# print dataframe
print("final data frame:")
display(df)
输出:
示例 2:
Python
# import pandas
import pandas as pd
# create dataframe
df = pd.DataFrame({'Col_1': ['a', 'b', 'c', 'b', 'a', 'd'],
'Col_2': [1, 2, 3, 3, 2, 1]})
# print original dataframe
print("original dataframe:")
display(df)
# call groupby method.
df = df.groupby("Col_2")
# call agg method
df = df.agg({"Col_1": "nunique"})
# call reset_index method
df = df.reset_index()
# print dataframe
print("final data frame:")
display(df)
输出: