📜  groupby - Python (1)

📅  最后修改于: 2023-12-03 14:41:39.431000             🧑  作者: Mango

GroupBy in Python

The groupby function in Python allows you to group data from a list or DataFrame by a specified key, and perform aggregate functions on each group.

Grouping Lists with groupby

Here's an example of how to use groupby with a list of dictionaries:

data = [
    {'name': 'Alice', 'age': 23},
    {'name': 'Bob', 'age': 21},
    {'name': 'Charlie', 'age': 23},
    {'name': 'David', 'age': 25},
    {'name': 'Edward', 'age': 23},
]

from itertools import groupby

groups = groupby(data, key=lambda x: x['age'])

for key, group in groups:
    print(key)
    for data in group:
        print(data['name'])

Output:

23
Alice
Charlie
Edward
21
Bob
25
David

In this example, we are grouping the list of dictionaries by the 'age' key. The groupby function returns an iterator that produces pairs (key, group) where key is the value of the key in the current group, and group is an iterator that produces the items in that group.

Grouping DataFrames with groupby

Pandas is a popular library in Python for data analysis. groupby is also used for data analysis by grouping data in a Pandas DataFrame. Here's an example:

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
    'age': [23, 21, 23, 25, 23],
    'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Boston']
}

df = pd.DataFrame(data)

grouped = df.groupby('age')

for key, group in grouped:
    print(key)
    print(group)

Output:

21
   name  age         city
1   Bob   21  Los Angeles
23
      name  age       city
0    Alice   23   New York
2  Charlie   23    Chicago
4   Edward   23     Boston
25
    name  age     city
3  David   25  Houston

In this example, we created a DataFrame and grouped it by the 'age' column using the groupby method. We then printed each group separately.

Aggregate Functions with groupby

You can also use groupby with aggregate functions to perform calculations on each group. Here's an example with a Pandas DataFrame:

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
    'age': [23, 21, 23, 25, 23],
    'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Boston']
}

df = pd.DataFrame(data)

grouped = df.groupby('age')

median_age = grouped['age'].median()

print(median_age)

Output:

age
21    21
23    23
25    25
Name: age, dtype: int64

In this example, we used the groupby method to group the DataFrame by the 'age' column. We then used the median method to calculate the median age for each group. The result is a new DataFrame with the median age for each age group.

Conclusion

The groupby function in Python is a powerful tool for grouping data and performing aggregate functions on each group. Whether you're working with lists or Pandas DataFrames, groupby can help you analyze your data more effectively.