从 Pandas 数据框中删除列中缺少值或 NaN 的行

Pandas 提供了各种数据结构和操作来操作数值数据和时间序列。但是，在某些情况下，某些数据可能会丢失。在 Pandas 中，缺失数据由两个值表示：

None： None 是一个Python单例对象，通常用于Python代码中的缺失数据。
NaN： NaN（Not a Number 的首字母缩写词），是所有使用标准 IEEE 浮点表示的系统都可以识别的特殊浮点值

Pandas 将None和NaN视为本质上可以互换以指示缺失值或空值。为了从数据框中删除空值，我们使用dropna()函数，该函数以不同的方式删除具有空值的数据集的行/列。

Syntax:
DataFrame.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False)

Parameters:
axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and ‘index’ or ‘columns’ for String.
how: how takes string value of two kinds only (‘any’ or ‘all’). ‘any’ drops the row/column if ANY value is Null and ‘all’ drops only if ALL values are null.
thresh: thresh takes integer value which tells minimum amount of na values to drop.
subset: It’s an array which limits the dropping process to passed rows/columns through list.
inplace: It is a boolean which makes the changes in data frame itself if True.

编程需要懂一点英语

代码 #1：删除至少有 1 个空值的行。

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
    
df

现在我们删除具有至少一个 Nan 值（Null 值）的行

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
  
# using dropna() function  
df.dropna()

输出：

代码 #2：如果该行中的所有值都丢失，则删除行。

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
    
df

现在我们删除所有数据丢失或包含空值（NaN）的行

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
df = pd.DataFrame(dict)
  
# using dropna() function    
df.dropna(how = 'all')

输出：

代码 #3：删除至少有 1 个空值的列。

# importing pandas as pd
import pandas as pd
   
# importing numpy as np
import numpy as np
   
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[60, 67, 68, 65]}
  
# creating a dataframe from dictionary 
df = pd.DataFrame(dict)
     
df

现在我们删除至少有 1 个缺失值的列

# importing pandas as pd
import pandas as pd
   
# importing numpy as np
import numpy as np
   
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[60, 67, 68, 65]}
  
# creating a dataframe from dictionary  
df = pd.DataFrame(dict)
  
# using dropna() function     
df.dropna(axis = 1)

输出：

代码 #4：删除 CSV 文件中至少有 1 个空值的行。

注意：在此，我们使用的是 CSV 文件，要下载使用的 CSV 文件，请单击此处。

# importing pandas module 
import pandas as pd 
    
# making data frame from csv file 
data = pd.read_csv("employees.csv") 
    
# making new data frame with dropped NA values 
new_data = data.dropna(axis = 0, how ='any') 
    
new_data

输出：

现在我们比较数据帧的大小，以便我们可以知道有多少行至少有 1 个 Null 值

print("Old data frame length:", len(data))
print("New data frame length:", len(new_data)) 
print("Number of rows with at least 1 NA value: ",
      (len(data)-len(new_data)))

输出：

Old data frame length: 1000
New data frame length: 764
Number of rows with at least 1 NA value:  236

由于差异为 236，因此有 236 行在任何列中至少有 1 个 Null 值。