如何在 Pandas 的数据透视表中包含百分比?
Seaborn 是一个了不起的可视化库,用于在Python中绘制统计图形。它提供了漂亮的默认样式和调色板,使统计图更具吸引力。它建立在matplotlib库的顶部,并且还紧密集成到pandas的数据结构中。
数据透视表用于汇总包含各种统计概念的数据。为了计算数据透视表中类别的百分比,我们计算类别计数与总计数的比率。下面是一些描述如何在数据透视表中包含百分比的示例:
示例 1:
在下图中,已为已计算性别百分比的给定数据集创建数据透视表。
Python3
# importing pandas library
import pandas as pd
# creating dataframe
df = pd.DataFrame({'Name': ['John', 'Sammy', 'Stephan', 'Joe', 'Emily', 'Tom'],
'Gender': ['Male', 'Female', 'Male',
'Female', 'Female', 'Male'],
'Age': [45, 6, 4, 36, 12, 43]})
print("Dataset")
print(df)
print("-"*40)
# categorizing in age groups
def age_bucket(age):
if age <= 18:
return "<18"
else:
return ">18"
df['Age Group'] = df['Age'].apply(age_bucket)
# calculating gender percentage
gender = pd.DataFrame(df.Gender.value_counts(normalize=True)*100).reset_index()
gender.columns = ['Gender', '%Gender']
df = pd.merge(left=df, right=gender, how='inner', on=['Gender'])
# creating pivot table
table = pd.pivot_table(df, index=['Gender', '%Gender', 'Age Group'],
values=['Name'], aggfunc={'Name': 'count',})
# display table
print("Table")
print(table)
Python3
# importing required libraries
import pandas as pd
import matplotlib.pyplot as plt
# creating dataframe
df = pd.DataFrame({
'Name': ['John', 'Emily', 'Smith', 'Joe'],
'Gender': ['Male', 'Female', 'Male', 'Female'],
'Salary(in $)': [20, 40, 35, 28]})
print("Dataset")
print(df)
print("-"*40)
# creating pivot table
table = pd.pivot_table(df, index=['Gender', 'Name'])
# calculating percentage
table['% Income'] = (table['Salary(in $)']/table['Salary(in $)'].sum())*100
# display table
print("Pivot Table")
print(table)
输出:
示例 2:
这是另一个示例,它描述了如何计算特定列中变量占其总和的百分比:
蟒蛇3
# importing required libraries
import pandas as pd
import matplotlib.pyplot as plt
# creating dataframe
df = pd.DataFrame({
'Name': ['John', 'Emily', 'Smith', 'Joe'],
'Gender': ['Male', 'Female', 'Male', 'Female'],
'Salary(in $)': [20, 40, 35, 28]})
print("Dataset")
print(df)
print("-"*40)
# creating pivot table
table = pd.pivot_table(df, index=['Gender', 'Name'])
# calculating percentage
table['% Income'] = (table['Salary(in $)']/table['Salary(in $)'].sum())*100
# display table
print("Pivot Table")
print(table)
输出: