如何在Python中将分类数据转换为二进制数据?
分类数据是对应于分类变量的数据。分类变量是采用固定的一组有限的可能值的变量。例如性别、血型、是否拥有乡村住宅等。
分类数据的特征:
- 这主要用于统计。
- 无法对此类数据进行加法、减法等数值运算。
- Categorical Data 的所有值都在 Categories 中。
- 它通常使用数组数据结构。
例子 :
二进制数据是一种使用两种可能状态或值(即 0 和 1)的数据。二进制数据主要用于各种领域,例如在计算机科学中我们将其用作名称 Bit(二进制数字),在数字电子和数学中我们使用它如名称真值,我们在统计中使用名称二元变量。
特征 :
- (0 和 1)也称为(真与假)、(成功与失败)、(是与否)等。
- 二进制数据是离散数据,也用于统计。
例子 :
将分类数据转换为二进制数据
我们的任务是在Python中将分类数据转换为二进制数据,如下所示:
循序渐进的方法:
步骤 1)为了将分类数据转换为二进制数据,我们使用了 Pandas Framework 中提供的一些函数。这就是导入 Pandas 框架的原因
Python3
# import required module
import pandas as pd
Python3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
Python3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)
Python3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
print(df_one)
Python3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
# print(data_frame)
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
# print(df_one)
# display result
df_two = pd.concat((df_one, data_frame), axis=1)
df_two = df_two.drop(["Gender"], axis=1)
df_two = df_two.drop(["Male"], axis=1)
result = df_two.rename(columns={"Female": "Gender"})
print(result)
Python3
# Pandas is imported in order to use various inbuilt
# Functions available in Pandas framework
import pandas as pd
# Data is initialized here
data = [["Jagroop", "Male"], ["Parveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# Data frame is created under column name Name and Gender
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
# Data of Gender is converted into Binary Data
df_one = pd.get_dummies(data_frame["Gender"])
# Binary Data is Concatenated into Dataframe
df_two = pd.concat((df_one, data_frame), axis=1)
# Gendercolumn is droped
df_two = df_two.drop(["Gender"], axis=1)
# We want Male =0 and Female =1 So we drop Male column here
df_two = df_two.drop(["Male"], axis=1)
# Rename the Column
result = df_two.rename(columns={"Female": "Gender"})
# Print the Result
print(result)
Step2)之后创建一个列表并输入如下所示的数据。
蟒蛇3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
步骤 3)在使用pd.DataFrame()创建该数据帧之后,我们在此处添加额外的行,即print(data_frame)以显示分类数据输出,如下所示:
蟒蛇3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)
输出:
第 4 步)直到第 3 步,我们现在得到分类数据,我们将把它转换成二进制数据。因此,为此,我们必须使用 Pandas 的内置函数,即get_dummies() ,如下所示:
在这里,我们仅将get_dummies()用于 Gender 列,因为在这里我们只想将分类数据转换为仅用于 Gender 列的二进制数据。
蟒蛇3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
print(df_one)
在这里,我们仅获得性别列的二进制代码输出。在这里,我们有两种选择可以明智地使用它:
- 将上述输出添加到数据框 -> 删除性别列 -> 删除女性列(如果我们想要男性 =1 和女性 =0) -> 重命名男性 = 性别 -> 显示转换输出。
- 将以上输出添加到数据框 -> 删除性别列 -> 删除男性列(如果我们想要男性 =0 和女性 =1) -> 重命名女性 = 性别 -> 显示转换输出。
在下面的程序中,我们使用了第一个选项并相应地编写代码,如下所示:
蟒蛇3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
# print(data_frame)
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
# print(df_one)
# display result
df_two = pd.concat((df_one, data_frame), axis=1)
df_two = df_two.drop(["Gender"], axis=1)
df_two = df_two.drop(["Male"], axis=1)
result = df_two.rename(columns={"Female": "Gender"})
print(result)
输出:
以下是基于上述方法的完整程序:
蟒蛇3
# Pandas is imported in order to use various inbuilt
# Functions available in Pandas framework
import pandas as pd
# Data is initialized here
data = [["Jagroop", "Male"], ["Parveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# Data frame is created under column name Name and Gender
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
# Data of Gender is converted into Binary Data
df_one = pd.get_dummies(data_frame["Gender"])
# Binary Data is Concatenated into Dataframe
df_two = pd.concat((df_one, data_frame), axis=1)
# Gendercolumn is droped
df_two = df_two.drop(["Gender"], axis=1)
# We want Male =0 and Female =1 So we drop Male column here
df_two = df_two.drop(["Male"], axis=1)
# Rename the Column
result = df_two.rename(columns={"Female": "Gender"})
# Print the Result
print(result)
输出: