📅  最后修改于: 2023-12-03 15:20:56.549000             🧑  作者: Mango
When working with data, it is often useful to know the frequency distribution of a categorical variable. This is where the value_counts()
method in pandas comes in handy.
value_counts()
is a method in pandas that returns a Series containing counts of unique values in a DataFrame column. It can be used on a pandas series to get the count of unique values.
DataFrame['Column_name'].value_counts(dropna=True)
where:
DataFrame['Column_name']
is the name of the column on which you want to perform the operation.dropna
is a boolean parameter which drops the NaN values before performing the operation. The default value is True
.You can also return the result as a DataFrame with named columns using the to_frame()
method. The to_frame()
method can be called on any Series object and it returns a new DataFrame with the row labels of the original Series.
DataFrame['Column_name'].value_counts(dropna=True).to_frame()
Let's suppose we have a DataFrame df
with a column fruit
containing values of different fruits.
import pandas as pd
df = pd.DataFrame({'fruit': ['apple', 'orange', 'banana', 'orange', 'orange', 'apple', 'banana']})
To get the frequency distribution of the fruit
column, we can use the value_counts()
method as follows:
freq_dist = df['fruit'].value_counts()
print(freq_dist)
Output:
orange 3
apple 2
banana 2
Name: fruit, dtype: int64
We can also return the result as a DataFrame with named columns:
freq_dist_df = df['fruit'].value_counts().to_frame()
freq_dist_df.columns = ['Frequency']
print(freq_dist_df)
Output:
Frequency
orange 3
apple 2
banana 2
In summary, the value_counts()
method in pandas makes it easy to get the frequency distribution of a categorical variable. It is a must-have tool in a data analyst's toolbox.