Python| Pandas DataFrame.dropna()

Python是一种用于进行数据分析的出色语言，主要是因为以数据为中心的Python包的奇妙生态系统。 Pandas 就是其中之一，它使导入和分析数据变得更加容易。

有时 csv 文件有空值，稍后在数据框中显示为 NaN。 Pandas dropna()方法允许用户以不同的方式分析和删除具有 Null 值的行/列。

句法：

DataFrameName.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

参数：

axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and ‘index’ or ‘columns’ for String.
how: how takes string value of two kinds only (‘any’ or ‘all’). ‘any’ drops the row/column if ANY value is Null and ‘all’ drops only if ALL values are null.
thresh: thresh takes integer value which tells minimum amount of na values to drop.
subset: It’s an array which limits the dropping process to passed rows/columns through list.
inplace: It is a boolean which makes the changes in data frame itself if True.

编程需要懂一点英语

有关代码中使用的 CSV 文件的链接，请单击此处。

示例 #1：删除具有至少 1 个空值的行。

读取数据框并删除具有任何 Null 值的所有行。比较新旧数据帧的大小，以查看有多少行具有至少 1 个 Null 值。

# importing pandas module
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv")
  
# making new data frame with dropped NA values
new_data = data.dropna(axis = 0, how ='any')
  
# comparing sizes of data frames
print("Old data frame length:", len(data), "\nNew data frame length:", 
       len(new_data), "\nNumber of rows with at least 1 NA value: ",
       (len(data)-len(new_data)))

输出：

Old data frame length:  458 
New data frame length:  364 
Number of rows with at least 1 NA value:  94

由于差值为 94，因此有 94 行在任何列中至少有 1 个 Null 值。

示例 #2：更改轴并使用 how 和 inplace 参数

制作了两个数据帧。将所有值 = none 的列添加到新数据框中。验证列名以查看 Null 列是否正确插入。然后在删除 NaN 值之前和之后比较列数。

# importing pandas module
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv")
  
# making a copy of old data frame
new = pd.read_csv("nba.csv")
  
# creating a value with all null values in new data frame
new["Null Column"]= None
  
# checking if column is inserted properly 
print(data.columns.values, "\n", new.columns.values)
  
# comparing values before dropping null column
print("\nColumn number before dropping Null column\n",
       len(data.dtypes), len(new.dtypes))
  
# dropping column with all null values
new.dropna(axis = 1, how ='all', inplace = True)
  
# comparing values after dropping null column
print("\nColumn number after dropping Null column\n",
      len(data.dtypes), len(new.dtypes))

输出：

['Name' 'Team' 'Number' 'Position' 'Age' 'Height' 'Weight' 'College'
 'Salary'] 
 ['Name' 'Team' 'Number' 'Position' 'Age' 'Height' 'Weight' 'College'
 'Salary' 'Null Column']

Column number before dropping Null column
 9 10

Column number after dropping Null column
 9 9