📅  最后修改于: 2023-12-03 14:41:23.451000             🧑  作者: Mango
get_dummies
is a function that is commonly used in Python for data analysis and preprocessing tasks. It is part of the pandas
library and is used to convert categorical variables into dummy/indicator variables.
pd.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
data
: DataFrame or Series. The data to be converted.prefix
: str or list of str, default None. A string to add to the column names, or a list of strings to add separate prefix to each column.prefix_sep
: str, default '_'. If appending prefix, separator/delimiter to use.dummy_na
: bool, default False. Add a column to indicate NaNs/nulls, if False NaN/nulls are ignored.columns
: list-like, default None. Column names to encode. Encodes all columns by default. sparse
: bool, default False. Whether the dummy-encoded columns should be sparse (True) or dens (False).drop_first
: bool, default False. Whether to get k-1 dummies out of k categorical levels by removing the first level.dtype
: dtype, default np.uint8. Data type for new columns.import pandas as pd
# create a Pandas DataFrame
df = pd.DataFrame({'Animal': ['cat', 'dog', 'dog', 'fish', 'cat', 'cat'], 'Legs': [4, 4, 4, 0, 4, 4]})
# encode the categorical variable 'Animal' using get_dummies
df_encoded = pd.get_dummies(df, columns=['Animal'])
print(df_encoded)
Output:
Legs Animal_cat Animal_dog Animal_fish
0 4 1 0 0
1 4 0 1 0
2 4 0 1 0
3 0 0 0 1
4 4 1 0 0
5 4 1 0 0
As we can see from the above example, the get_dummies()
function has encoded the categorical variable 'Animal' into dummy variables. The prefix 'Animal' has been added to the column names to distinguish them from other columns in the data set.
get_dummies()
is a powerful and convenient function in the pandas
library for encoding categorical variables, making it easier to analyze and preprocess data for machine learning models.