Python Pandas – get_dummies() 方法
pandas.get_dummies()用于数据操作。它将分类数据转换为虚拟变量或指示变量。
syntax: pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
Parameters:
- data: whose data is to be manipulated.
- prefix: String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Default value is None.
- prefix_sep: Separator/delimiter to use if appending any prefix. Default is ‘_’
- dummy_na: It adds a column to indicate NaN values, default value is false, If false NaNs are ignored.
- columns: Column names in the DataFrame that needs to be encoded. Default value is None, If columns is None then all the columns with object or category dtype will be converted.
- sparse: It specify whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False). default value is False.
- drop_first: Remove first level to get k-1 dummies out of k categorical levels.
- dtype: Data type for new columns. Only a single dtype is allowed. Default value is np.uint8.
Returns: Dataframe (Dummy-coded data)
示例 1:
Python3
import pandas as pd
con = pd.Series(list('abcba'))
print(pd.get_dummies(con))
Python
import pandas as pd
import numpy as np
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li))
Python
import pandas as pd
import numpy as np
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li, dummy_na=True))
Python3
import pandas as pd
import numpy as np
# dictionary
diff = pd.DataFrame({'R': ['a', 'c', 'd'],
'T': ['d', 'a', 'c'],
'S_': [1, 2, 3]})
print(pd.get_dummies(diff, prefix=['column1', 'column2']))
输出:
示例 2:
Python
import pandas as pd
import numpy as np
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li))
输出:
示例 3:(获取 NaN 列)
Python
import pandas as pd
import numpy as np
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li, dummy_na=True))
输出:
示例 4:
Python3
import pandas as pd
import numpy as np
# dictionary
diff = pd.DataFrame({'R': ['a', 'c', 'd'],
'T': ['d', 'a', 'c'],
'S_': [1, 2, 3]})
print(pd.get_dummies(diff, prefix=['column1', 'column2']))
输出: