📅  最后修改于: 2023-12-03 14:41:39.431000             🧑  作者: Mango
The groupby
function in Python allows you to group data from a list or DataFrame by a specified key, and perform aggregate functions on each group.
Here's an example of how to use groupby
with a list of dictionaries:
data = [
{'name': 'Alice', 'age': 23},
{'name': 'Bob', 'age': 21},
{'name': 'Charlie', 'age': 23},
{'name': 'David', 'age': 25},
{'name': 'Edward', 'age': 23},
]
from itertools import groupby
groups = groupby(data, key=lambda x: x['age'])
for key, group in groups:
print(key)
for data in group:
print(data['name'])
Output:
23
Alice
Charlie
Edward
21
Bob
25
David
In this example, we are grouping the list of dictionaries by the 'age' key. The groupby
function returns an iterator that produces pairs (key, group)
where key
is the value of the key in the current group, and group
is an iterator that produces the items in that group.
Pandas is a popular library in Python for data analysis. groupby
is also used for data analysis by grouping data in a Pandas DataFrame. Here's an example:
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
'age': [23, 21, 23, 25, 23],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Boston']
}
df = pd.DataFrame(data)
grouped = df.groupby('age')
for key, group in grouped:
print(key)
print(group)
Output:
21
name age city
1 Bob 21 Los Angeles
23
name age city
0 Alice 23 New York
2 Charlie 23 Chicago
4 Edward 23 Boston
25
name age city
3 David 25 Houston
In this example, we created a DataFrame and grouped it by the 'age' column using the groupby
method. We then printed each group separately.
You can also use groupby
with aggregate functions to perform calculations on each group. Here's an example with a Pandas DataFrame:
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
'age': [23, 21, 23, 25, 23],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Boston']
}
df = pd.DataFrame(data)
grouped = df.groupby('age')
median_age = grouped['age'].median()
print(median_age)
Output:
age
21 21
23 23
25 25
Name: age, dtype: int64
In this example, we used the groupby
method to group the DataFrame by the 'age' column. We then used the median
method to calculate the median age for each group. The result is a new DataFrame with the median age for each age group.
The groupby
function in Python is a powerful tool for grouping data and performing aggregate functions on each group. Whether you're working with lists or Pandas DataFrames, groupby
can help you analyze your data more effectively.