Python|熊猫 DataFrame.astype()
Python是一种用于进行数据分析的出色语言,主要是因为以数据为中心的Python包的奇妙生态系统。 Pandas 就是其中之一,它使导入和分析数据变得更加容易。
DataFrame.astype()
方法用于将 pandas 对象转换为指定的 dtype。 astype()
函数还提供将任何合适的现有列转换为分类类型的能力。
当我们想要将特定列数据类型转换为另一种数据类型时, DataFrame.astype()
函数非常方便。不仅如此,我们还可以使用Python字典输入来一次更改多个列类型。字典中的键标签对应于列名,字典中的值标签对应于我们希望列的新数据类型。
Syntax: DataFrame.astype(dtype, copy=True, errors=’raise’, **kwargs)
Parameters:
dtype : Use a numpy.dtype
or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype
or Python type to cast one or more of the DataFrame’s columns to column-specific types.
copy : Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).
errors : Control raising of exceptions on invalid data for provided dtype.
raise : allow exceptions to be raised
ignore : suppress exceptions. On error return original object
kwargs :keyword arguments to pass on to the constructor
Returns: casted : type of caller
有关代码中使用的 CSV 文件的链接,请单击此处
示例 #1:转换权重列数据类型。
# importing pandas as pd
import pandas as pd
# Making data frame from the csv file
df = pd.read_csv("nba.csv")
# Printing the first 10 rows of
# the data frame for visualization
df[:10]
由于数据有一些“nan”值,所以为了避免任何错误,我们将删除所有包含任何nan
值的行。
# drop all those rows which
# have any 'nan' value in it.
df.dropna(inplace = True)
# let's find out the data type of Weight column
before = type(df.Weight[0])
# Now we will convert it into 'int64' type.
df.Weight = df.Weight.astype('int64')
# let's find out the data type after casting
after = type(df.Weight[0])
# print the value of before
before
# print the value of after
after
输出:
# print the data frame and see
# what it looks like after the change
df
示例 #2:一次更改多列的数据类型
将Name
列更改为 categorical 类型,将Age
列更改为 int64 类型。
# importing pandas as pd
import pandas as pd
# Making data frame from the csv file
df = pd.read_csv("nba.csv")
# Drop the rows with 'nan' values
df = df.dropna()
# print the existing data type of each column
df.info()
输出:
现在让我们一次更改两个列的数据类型。
# Passed a dictionary to astype() function
df = df.astype({"Name":'category', "Age":'int64'})
# Now print the data type
# of all columns after change
df.info()
输出:
# print the data frame
# too after the change
df
输出: