📌  相关文章
📜  如何在 Pandas 的数据透视表中包含百分比?

📅  最后修改于: 2022-05-13 01:55:18.499000             🧑  作者: Mango

如何在 Pandas 的数据透视表中包含百分比?

Seaborn 是一个了不起的可视化库,用于在Python中绘制统计图形。它提供了漂亮的默认样式和调色板,使统计图更具吸引力。它建立在matplotlib库的顶部,并且还紧密集成到pandas的数据结构中。

数据透视表用于汇总包含各种统计概念的数据。为了计算数据透视表中类别的百分比,我们计算类别计数与总计数的比率。下面是一些描述如何在数据透视表中包含百分比的示例:

示例 1:

在下图中,已为已计算性别百分比的给定数据集创建数据透视表。

Python3
# importing pandas library
import pandas as pd
  
# creating dataframe
df = pd.DataFrame({'Name': ['John', 'Sammy', 'Stephan', 'Joe', 'Emily', 'Tom'],
                   'Gender': ['Male', 'Female', 'Male',
                              'Female', 'Female', 'Male'],
                   'Age': [45, 6, 4, 36, 12, 43]})
print("Dataset")
print(df)
print("-"*40)
  
# categorizing in age groups
def age_bucket(age):
    if age <= 18:
        return "<18"
    else:
        return ">18"
  
df['Age Group'] = df['Age'].apply(age_bucket)
  
# calculating gender percentage
gender = pd.DataFrame(df.Gender.value_counts(normalize=True)*100).reset_index()
gender.columns = ['Gender', '%Gender']
df = pd.merge(left=df, right=gender, how='inner', on=['Gender'])
  
# creating pivot table
table = pd.pivot_table(df, index=['Gender', '%Gender', 'Age Group'], 
                       values=['Name'], aggfunc={'Name': 'count',})
  
# display table
print("Table")
print(table)


Python3
# importing required libraries
import pandas as pd
import matplotlib.pyplot as plt
  
# creating dataframe
df = pd.DataFrame({
    'Name': ['John', 'Emily', 'Smith', 'Joe'],
    'Gender': ['Male', 'Female', 'Male', 'Female'],
    'Salary(in $)': [20, 40, 35, 28]})
  
print("Dataset")
print(df)
print("-"*40)
  
# creating pivot table
table = pd.pivot_table(df, index=['Gender', 'Name'])
  
# calculating percentage
table['% Income'] = (table['Salary(in $)']/table['Salary(in $)'].sum())*100
  
# display table
print("Pivot Table")
print(table)


输出:

示例 2:

这是另一个示例,它描述了如何计算特定列中变量占其总和的百分比:

蟒蛇3

# importing required libraries
import pandas as pd
import matplotlib.pyplot as plt
  
# creating dataframe
df = pd.DataFrame({
    'Name': ['John', 'Emily', 'Smith', 'Joe'],
    'Gender': ['Male', 'Female', 'Male', 'Female'],
    'Salary(in $)': [20, 40, 35, 28]})
  
print("Dataset")
print(df)
print("-"*40)
  
# creating pivot table
table = pd.pivot_table(df, index=['Gender', 'Name'])
  
# calculating percentage
table['% Income'] = (table['Salary(in $)']/table['Salary(in $)'].sum())*100
  
# display table
print("Pivot Table")
print(table)

输出: