📜  get_dummies - Python (1)

📅  最后修改于: 2023-12-03 14:41:23.451000             🧑  作者: Mango

get_dummies in Python

get_dummies is a function that is commonly used in Python for data analysis and preprocessing tasks. It is part of the pandas library and is used to convert categorical variables into dummy/indicator variables.

Syntax
pd.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
Parameters
  • data: DataFrame or Series. The data to be converted.
  • prefix: str or list of str, default None. A string to add to the column names, or a list of strings to add separate prefix to each column.
  • prefix_sep: str, default '_'. If appending prefix, separator/delimiter to use.
  • dummy_na: bool, default False. Add a column to indicate NaNs/nulls, if False NaN/nulls are ignored.
  • columns: list-like, default None. Column names to encode. Encodes all columns by default.
  • sparse: bool, default False. Whether the dummy-encoded columns should be sparse (True) or dens (False).
  • drop_first: bool, default False. Whether to get k-1 dummies out of k categorical levels by removing the first level.
  • dtype: dtype, default np.uint8. Data type for new columns.
Example
import pandas as pd

# create a Pandas DataFrame
df = pd.DataFrame({'Animal': ['cat', 'dog', 'dog', 'fish', 'cat', 'cat'], 'Legs': [4, 4, 4, 0, 4, 4]})

# encode the categorical variable 'Animal' using get_dummies
df_encoded = pd.get_dummies(df, columns=['Animal'])

print(df_encoded)

Output:

   Legs  Animal_cat  Animal_dog  Animal_fish
0     4           1           0            0
1     4           0           1            0
2     4           0           1            0
3     0           0           0            1
4     4           1           0            0
5     4           1           0            0

As we can see from the above example, the get_dummies() function has encoded the categorical variable 'Animal' into dummy variables. The prefix 'Animal' has been added to the column names to distinguish them from other columns in the data set.

Conclusion

get_dummies() is a powerful and convenient function in the pandas library for encoding categorical variables, making it easier to analyze and preprocess data for machine learning models.