📅  最后修改于: 2023-12-03 15:03:28.360000             🧑  作者: Mango
In data analysis, it is often necessary to group data by a certain attribute and then perform some operations on the groups. Pandas groupby()
function is an efficient way to group data in pandas. However, sometimes we only want to show a specific column of the grouped data.
In this tutorial, we will learn how to use the groupby()
function in pandas to group data and then display only specific columns of the grouped data.
We will use the following sample data for the demonstration:
| Country | Year | GDP | Population | |---------|------|--------|------------| | USA | 2010 | 14624 | 309 | | USA | 2011 | 14964 | 312 | | USA | 2012 | 15497 | 314 | | China | 2010 | 6070 | 1339 | | China | 2011 | 7319 | 1347 | | China | 2012 | 8560 | 1355 |
First, we need to import pandas library and load our sample data into a pandas dataframe:
import pandas as pd
data = {
'Country': ['USA', 'USA', 'USA', 'China', 'China', 'China'],
'Year': [2010, 2011, 2012, 2010, 2011, 2012],
'GDP': [14624, 14964, 15497, 6070, 7319, 8560],
'Population': [309, 312, 314, 1339, 1347, 1355]
}
df = pd.DataFrame(data)
Next, we can group the data by a certain attribute, such as 'Country', using the groupby()
function:
grouped = df.groupby('Country')
This will group our data by the 'Country' attribute. Now, to display only specific columns of the grouped data, we can use the apply()
function along with a lambda function. The lambda function will select the columns we want to display. For example, if we want to display only the 'Year' and 'GDP' columns of the grouped data, we can do the following:
result = grouped.apply(lambda x: x[['Year', 'GDP']])
The apply()
function will apply the lambda function to each group and the result will be a new dataframe with only the 'Year' and 'GDP' columns from each group.
To display the resulting dataframe, we can simply print it:
print(result)
The output will be:
Year GDP
Country
China 0 2010 6070
1 2011 7319
2 2012 8560
USA 0 2010 14624
1 2011 14964
2 2012 15497
This dataframe shows only the 'Year' and 'GDP' columns of the grouped data.
Pandas groupby()
function is a powerful tool for grouping data in pandas. In this tutorial, we have learned how to use groupby()
function to group data and then display only specific columns of the grouped data. We used the apply()
function with a lambda function to select the columns we wanted to display.