📌  相关文章
📜  如何在Python中将分类字符串数据转换为数字?

📅  最后修改于: 2022-05-13 01:54:20.059000             🧑  作者: Mango

如何在Python中将分类字符串数据转换为数字?

数据集具有数字和分类特征。分类特征是指字符串数据类型,易于人类理解。但是,机器不能直接解释分类数据。因此,必须将分类数据转换为数值数据以便进一步处理。

有很多方法可以将分类数据转换为数值数据。在本文中,我们将讨论两种最常用的方法,即:

  • 虚拟变量编码
  • 标签编码

在这两种方法中,我们都使用相同的数据,数据集的链接在这里

方法一:虚拟变量编码

我们将使用 pandas.get_dummies函数将分类字符串数据转换为数字。

句法:

逐步实施

第 1 步:导入库

Python3
# importing pandas as pd
import pandas as pd


Python3
# importing data using .read_csv() function
df = pd.read_csv('data.csv')
 
# printing DataFrame
df


Python3
# using .get_dummies function to convert
# the categorical datatype to numerical
# and storing the returned dataFrame
# in a new variable df1
df1 = pd.get_dummies(df['Purchased'])
 
# using pd.concat to concatenate the dataframes
# df and df1 and storing the concatenated
# dataFrame in df.
df = pd.concat([df, df1], axis=1).reindex(df.index)
 
# removing the column 'Purchased' from df
# as it is of no use now.
df.drop('Purchased', axis=1, inplace=True)
 
# printing df
df


Python3
# importing pandas as pd
import pandas as pd


Python3
#importing data using .read_csv() function
df = pd.read_csv('data.csv')
 
#printing DataFrame
df


Python3
# Importing LabelEncoder from Sklearn
# library from preprocessing Module.
from sklearn.preprocessing import LabelEncoder
 
# Creating a instance of label Encoder.
le = LabelEncoder()
 
# Using .fit_transform function to fit label
# encoder and return encoded label
label = le.fit_transform(df['Purchased'])
 
# printing label
label


Python3
# removing the column 'Purchased' from df
# as it is of no use now.
df.drop("Purchased", axis=1, inplace=True)
 
# Appending the array to our dataFrame
# with column name 'Purchased'
df["Purchased"] = label
 
# printing Dataframe
df


第 2 步:导入数据

Python3

# importing data using .read_csv() function
df = pd.read_csv('data.csv')
 
# printing DataFrame
df

输出:

第 3 步:将分类数据列转换为数值。

我们会将“Purchased”列从分类数据类型转换为数值数据类型。

Python3

# using .get_dummies function to convert
# the categorical datatype to numerical
# and storing the returned dataFrame
# in a new variable df1
df1 = pd.get_dummies(df['Purchased'])
 
# using pd.concat to concatenate the dataframes
# df and df1 and storing the concatenated
# dataFrame in df.
df = pd.concat([df, df1], axis=1).reindex(df.index)
 
# removing the column 'Purchased' from df
# as it is of no use now.
df.drop('Purchased', axis=1, inplace=True)
 
# printing df
df

输出:

方法二:标签编码

我们将使用sklearn库中的.LabelEncoder()将分类数据转换为数值数据。我们将在此过程中使用函数fit_transform()。

句法 :

逐步实施

第 1 步:导入库

Python3

# importing pandas as pd
import pandas as pd

第 2 步:导入数据

Python3

#importing data using .read_csv() function
df = pd.read_csv('data.csv')
 
#printing DataFrame
df

输出:

第 3 步:将分类数据列转换为数值。

我们会将“Purchased”列从分类数据类型转换为数值数据类型。

Python3

# Importing LabelEncoder from Sklearn
# library from preprocessing Module.
from sklearn.preprocessing import LabelEncoder
 
# Creating a instance of label Encoder.
le = LabelEncoder()
 
# Using .fit_transform function to fit label
# encoder and return encoded label
label = le.fit_transform(df['Purchased'])
 
# printing label
label

输出:

array([0, 1, 0, 0, 1, 1, 0, 1, 0, 1])

第 4 步:将标签数组附加到我们的 DataFrame

Python3

# removing the column 'Purchased' from df
# as it is of no use now.
df.drop("Purchased", axis=1, inplace=True)
 
# Appending the array to our dataFrame
# with column name 'Purchased'
df["Purchased"] = label
 
# printing Dataframe
df

输出: