📜  如何在Python中将分类数据转换为二进制数据?

📅  最后修改于: 2022-05-13 01:55:31.114000             🧑  作者: Mango

如何在Python中将分类数据转换为二进制数据?

分类数据是对应于分类变量的数据。分类变量是采用固定的一组有限的可能值的变量。例如性别、血型、是否拥有乡村住宅等。

分类数据的特征:

  • 这主要用于统计。
  • 无法对此类数据进行加法、减法等数值运算。
  • Categorical Data 的所有值都在 Categories 中。
  • 它通常使用数组数据结构。

例子 :

分类数据

二进制数据是一种使用两种可能状态或值(即 0 和 1)的数据。二进制数据主要用于各种领域,例如在计算机科学中我们将其用作名称 Bit(二进制数字),在数字电子和数学中我们使用它如名称真值,我们在统计中使用名称二元变量。

特征 :

  • (0 和 1)也称为(真与假)、(成功与失败)、(是与否)等。
  • 二进制数据是离散数据,也用于统计。

例子 :

二进制数据

将分类数据转换为二进制数据

我们的任务是在Python中将分类数据转换为二进制数据,如下所示:

循序渐进的方法:

步骤 1)为了将分类数据转换为二进制数据,我们使用了 Pandas Framework 中提供的一些函数。这就是导入 Pandas 框架的原因

Python3
# import required module
import pandas as pd


Python3
# import required modules
import pandas as pd
 
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]


Python3
# import required modules
import pandas as pd
 
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]
 
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)


Python3
# import required modules
import pandas as pd
 
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]
 
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)
 
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
print(df_one)


Python3
# import required modules
import pandas as pd
 
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]
 
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
# print(data_frame)
 
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
# print(df_one)
 
# display result
df_two = pd.concat((df_one, data_frame), axis=1)
df_two = df_two.drop(["Gender"], axis=1)
df_two = df_two.drop(["Male"], axis=1)
result = df_two.rename(columns={"Female": "Gender"})
print(result)


Python3
# Pandas is imported in order to use various inbuilt
# Functions available in Pandas framework
import pandas as pd
 
# Data is initialized here
data = [["Jagroop", "Male"], ["Parveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]
 
# Data frame is created under column name Name and Gender
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
 
# Data of Gender is converted into Binary Data
df_one = pd.get_dummies(data_frame["Gender"])
 
# Binary Data is Concatenated into Dataframe
df_two = pd.concat((df_one, data_frame), axis=1)
 
# Gendercolumn is droped
df_two = df_two.drop(["Gender"], axis=1)
 
# We want Male =0 and Female =1 So we drop Male column here
df_two = df_two.drop(["Male"], axis=1)
 
# Rename the Column
result = df_two.rename(columns={"Female": "Gender"})
 
# Print the Result
print(result)


Step2)之后创建一个列表并输入如下所示的数据。

蟒蛇3

# import required modules
import pandas as pd
 
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]

步骤 3)在使用pd.DataFrame()创建该数据帧之后,我们在此处添加额外的行,即print(data_frame)以显示分类数据输出,如下所示:

蟒蛇3

# import required modules
import pandas as pd
 
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]
 
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)

输出:

分类数据

第 4 步)直到第 3 步,我们现在得到分类数据,我们将把它转换成二进制数据。因此,为此,我们必须使用 Pandas 的内置函数,即get_dummies() ,如下所示:

在这里,我们仅将get_dummies()用于 Gender 列,因为在这里我们只想将分类数据转换为仅用于 Gender 列的二进制数据。

蟒蛇3

# import required modules
import pandas as pd
 
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]
 
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)
 
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
print(df_one)

第 4 步的输出

在这里,我们仅获得性别列的二进制代码输出。在这里,我们有两种选择可以明智地使用它:

  1. 将上述输出添加到数据框 -> 删除性别列 -> 删除女性列(如果我们想要男性 =1 和女性 =0) -> 重命名男性 = 性别 -> 显示转换输出。
  2. 将以上输出添加到数据框 -> 删除性别列 -> 删除男性列(如果我们想要男性 =0 和女性 =1) -> 重命名女性 = 性别 -> 显示转换输出。

在下面的程序中,我们使用了第一个选项并相应地编写代码,如下所示:

蟒蛇3

# import required modules
import pandas as pd
 
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]
 
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
# print(data_frame)
 
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
# print(df_one)
 
# display result
df_two = pd.concat((df_one, data_frame), axis=1)
df_two = df_two.drop(["Gender"], axis=1)
df_two = df_two.drop(["Male"], axis=1)
result = df_two.rename(columns={"Female": "Gender"})
print(result)

输出:

输出

以下是基于上述方法的完整程序:

蟒蛇3

# Pandas is imported in order to use various inbuilt
# Functions available in Pandas framework
import pandas as pd
 
# Data is initialized here
data = [["Jagroop", "Male"], ["Parveen", "Male"],
        ["Harjot", "Female"], ["Pooja", "Female"],
        ["Mohit", "Male"]]
 
# Data frame is created under column name Name and Gender
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
 
# Data of Gender is converted into Binary Data
df_one = pd.get_dummies(data_frame["Gender"])
 
# Binary Data is Concatenated into Dataframe
df_two = pd.concat((df_one, data_frame), axis=1)
 
# Gendercolumn is droped
df_two = df_two.drop(["Gender"], axis=1)
 
# We want Male =0 and Female =1 So we drop Male column here
df_two = df_two.drop(["Male"], axis=1)
 
# Rename the Column
result = df_two.rename(columns={"Female": "Gender"})
 
# Print the Result
print(result)

输出:

输出