📜  如何在 Pandas 中自动转换为最佳数据类型?

📅  最后修改于: 2022-05-13 01:54:52.222000             🧑  作者: Mango

如何在 Pandas 中自动转换为最佳数据类型?

先决条件:熊猫

在 Pandas 中,默认数据类型是 int、float 和 objects。当我们在 Pandas 中加载或创建任何系列或数据框时,默认情况下 Pandas 会为列和系列分配必要的数据类型。

我们将使用 pandas convert_dtypes()函数将默认分配的数据类型自动转换为最佳数据类型。使用 convert_dtypes() 有一大好处——它支持缺失值 pd.NA 和 NaN 的新类型。 pandas 1.1.4 版本支持它。

句法:

对于系列:

series_name.convert_dtypes()

对于数据帧:

dataframe_name.convert_dtypes().dtypes

以下是系列和数据框的实现:

转换系列的数据类型:

  • 导入模块
  • 创建一个系列
  • 现在使用 convert_dtypes()函数自动转换数据类型

例子:

Python3
# importing packages
import pandas as pd
  
# creating a series
s = pd.Series(['Geeks', 'for', 'Geeks'])
  
# printing the series
print("SERIES")
print(s)
  
print()
  
# using convert_dtypes() function
print("AFTER DATATYPE CONVERSION")
print(s.convert_dtypes())


Python3
import pandas as pd
import numpy as np
  
# creating a dataframe
df = pd.DataFrame({"Roll_No.": ([1, 2, 3]),
                   "Name": ["Raj", "Ritu", "Rohan"],
                   "Result": ["Pass", "Fail", np.nan],
                   "Promoted": [True, False, np.nan],
                   "Marks": [90.33, 30.6, np.nan]})
  
# printing the dataframe
print("PRINTING DATAFRAME")
display(df)
  
# checking datatype
print()
print("PRINTING DATATYPE")
print(df.dtypes)
  
# converting datatype
print()
print("AFTER CONVERTING DATATYPE")
print(df.convert_dtypes().dtypes)


Python3
import pandas as pd
import numpy as np
  
# Creating the Data frame through series
# and specifying datatype along with it
df = pd.DataFrame({"Column_1": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
                   # Column_1 datatype is int32
                     
                   "Column_2": pd.Series(["Apple", "Ball", "Cat"], 
                                         dtype=np.dtype("object")),
                   # Column_2 datatype is 0
                     
                   "Column_3": pd.Series([True, False, np.nan], 
                                         dtype=np.dtype("object")),
                   # Column_3 datatype is 0
                     
                   "Column_4": pd.Series([10, np.nan, 20], 
                                         dtype=np.dtype("float")),
                   # Column_4 datatype is float
                     
                   "Column_5": pd.Series([np.nan, 100.5, 200],
                                         dtype=np.dtype("float"))})
                   # Column_5 datatype is float
  
# printing dataframe
print("PRINTING DATAFRAME")
display(df)
  
# checking datatype
print()
print("CHECKING DATATYPE")
print(df.dtypes)
  
# convert datatype
print()
print("AFTER DATATYPE CONVERSION")
print(df.convert_dtypes().dtypes)


输出:

转换数据帧的数据类型:

  • 导入模块
  • 创建数据框
  • 检查数据类型
  • 使用 convert_dtypes().dtypes函数转换数据类型

列的数据类型会相应更改。但是数据框的数据类型将保持对象,因为它包含多个列,每列具有不同的数据类型。

例子:

蟒蛇3

import pandas as pd
import numpy as np
  
# creating a dataframe
df = pd.DataFrame({"Roll_No.": ([1, 2, 3]),
                   "Name": ["Raj", "Ritu", "Rohan"],
                   "Result": ["Pass", "Fail", np.nan],
                   "Promoted": [True, False, np.nan],
                   "Marks": [90.33, 30.6, np.nan]})
  
# printing the dataframe
print("PRINTING DATAFRAME")
display(df)
  
# checking datatype
print()
print("PRINTING DATATYPE")
print(df.dtypes)
  
# converting datatype
print()
print("AFTER CONVERTING DATATYPE")
print(df.convert_dtypes().dtypes)

输出:

通过系列创建数据框并指定数据类型:

  • 导入模块
  • 通过系列创建数据框并指定数据类型
  • 检查数据类型
  • 使用 convert_dtypes().dtypes函数进行转换

例子:

蟒蛇3

import pandas as pd
import numpy as np
  
# Creating the Data frame through series
# and specifying datatype along with it
df = pd.DataFrame({"Column_1": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
                   # Column_1 datatype is int32
                     
                   "Column_2": pd.Series(["Apple", "Ball", "Cat"], 
                                         dtype=np.dtype("object")),
                   # Column_2 datatype is 0
                     
                   "Column_3": pd.Series([True, False, np.nan], 
                                         dtype=np.dtype("object")),
                   # Column_3 datatype is 0
                     
                   "Column_4": pd.Series([10, np.nan, 20], 
                                         dtype=np.dtype("float")),
                   # Column_4 datatype is float
                     
                   "Column_5": pd.Series([np.nan, 100.5, 200],
                                         dtype=np.dtype("float"))})
                   # Column_5 datatype is float
  
# printing dataframe
print("PRINTING DATAFRAME")
display(df)
  
# checking datatype
print()
print("CHECKING DATATYPE")
print(df.dtypes)
  
# convert datatype
print()
print("AFTER DATATYPE CONVERSION")
print(df.convert_dtypes().dtypes)

输出: