如何使用 Pandas 在Python中创建数据透视表?
数据透视表是一个统计表,它汇总了像大数据集这样的大量表。它是数据处理的一部分。数据透视表中的摘要可能包括平均值、中位数、总和或其他统计项。数据透视表最初与 MS Excel 相关联,但我们可以使用 Pandas 使用 dataframe.pivot() 方法在Python中创建数据透视表。
Syntax : dataframe.pivot(self, index=None, columns=None, values=None, aggfunc)
Parameters –
index: Column for making new frame’s index.
columns: Column for new frame’s columns.
values: Column(s) for populating new frame’s values.
aggfunc: function, list of functions, dict, default numpy.mean
示例 1:
让我们首先创建一个包含水果销售的数据框。
# importing pandas
import pandas as pd
# creating dataframe
df = pd.DataFrame({'Product' : ['Carrots', 'Broccoli', 'Banana', 'Banana',
'Beans', 'Orange', 'Broccoli', 'Banana'],
'Category' : ['Vegetable', 'Vegetable', 'Fruit', 'Fruit',
'Vegetable', 'Fruit', 'Vegetable', 'Fruit'],
'Quantity' : [8, 5, 3, 4, 5, 9, 11, 8],
'Amount' : [270, 239, 617, 384, 626, 610, 62, 90]})
df
输出:
获取每个产品的总销售额
# creating pivot table of total sales
# product-wise aggfunc = 'sum' will
# allow you to obtain the sum of sales
# each product
pivot = df.pivot_table(index =['Product'],
values =['Amount'],
aggfunc ='sum')
print(pivot)
输出:
获取每个类别的总销售额
# creating pivot table of total
# sales category-wise aggfunc = 'sum'
# will allow you to obtain the sum of
# sales each product
pivot = df.pivot_table(index =['Category'],
values =['Amount'],
aggfunc ='sum')
print(pivot)
输出:
获取按类别和产品的总销售额
# creating pivot table of sales
# by product and category both
# aggfunc = 'sum' will allow you
# to obtain the sum of sales each
# product
pivot = df.pivot_table(index =['Product', 'Category'],
values =['Amount'], aggfunc ='sum')
print (pivot)
输出 -
按类别获取平均、中位数、最低销售额
# creating pivot table of Mean, Median,
# Minimum sale by category aggfunc = {'median',
# 'mean', 'min'} will get median, mean and
# minimum of sales respectively
pivot = df.pivot_table(index =['Category'], values =['Amount'],
aggfunc ={'median', 'mean', 'min'})
print (pivot)
输出 -
按产品获取均值、中值、最低销售额
# creating pivot table of Mean, Median,
# Minimum sale by product aggfunc = {'median',
# 'mean', 'min'} will get median, mean and
# minimum of sales respectively
pivot = df.pivot_table(index =['Product'], values =['Amount'],
aggfunc ={'median', 'mean', 'min'})
print (pivot)
输出: