📜  如何删除 Pandas DataFrame 中具有 NaN 值的列?

📅  最后修改于: 2022-05-13 01:55:10.006000             🧑  作者: Mango

如何删除 Pandas DataFrame 中具有 NaN 值的列?

Nan(不是数字)是一个浮点值,不能转换为其他数据类型,期望浮点数。在数据分析中,Nan 是不必要的值,为了正确分析数据集必须删除它。在本文中,我们将讨论如何删除/删除 Pandas 数据框中具有 Nan 值的列。我们有一个名为Pandas.DataFrame.dropna()的函数 删除具有 Nan 值的列。

示例 1:删除具有任何 NaN/NaT 值的所有列。

Python3
# Importing libraries
import pandas as pd
import numpy as np
  
# Creating a dictionary
dit = {'August': [pd.NaT, 25, 34, np.nan, 1.1, 10],
       'September': [4.8, pd.NaT, 68, 9.25, np.nan, 0.9],
       'October': [78, 5.8, 8.52, 12, 1.6, 11], }
  
# Converting it to data frame
df = pd.DataFrame(data=dit)
  
# DataFrame
df


Python3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
  
df


Python3
# Importing libraries
import pandas as pd
import numpy as np
  
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000], 
               [np.nan, 36, 74, np.nan],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000], 
               [pd.NaT, 39, 100, np.nan],
               [np.nan, 33, 90.5, 7028000],
               ['K.Peterson', 42, 85, pd.NaT]]
  
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age', 
                                        'Weight', 'Salary'])
  
df


Python3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
  
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
  
df


Python3
# Importing libraries
import pandas as pd
import numpy as np
  
# creating and initializing a nested list
age_list = [[np.nan, 1952, 8425333, np.nan, 28.35], 
            ['Australia', 1957, 9712569, 'Oceania', 24.26],
            ['Brazil', 1962, 76039390, np.nan, 30.24],
            [pd.NaT, 1957, 637408000, 'Asia', 28.32], 
            ['France', 1957, 44310863, pd.NaT, 25.21],
            ['India', 1952, 3.72e+08, pd.NaT, 27.36], 
            ['United States', 1957, 171984000, 'Americas', 28.98]]
  
# creating a pandas dataframe
df = pd.DataFrame(age_list, columns=[
                  'Country', 'Year', 'Population', 'Continent', 'lifeExp'])
  
df


Python3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
  
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
  
df


Python3
# Importing libraries 
import pandas as pd 
import numpy as np 
  
# Creating a dictionary 
dit = {'August': [10, np.nan, 34, 4.85, 71.2, 1.1], 
       'September': [np.nan, 54, 68, 9.25, pd.NaT, 0.9], 
        'October': [np.nan, 5.8, 8.52, np.nan, 1.6, 11],
       'November': [pd.NaT, 5.8, 50, 8.9, 77, pd.NaT] }
  
# Converting it to data frame
df = pd.DataFrame(data=dit)
  
# data frame
df


Python3
# Dropping the columns having NaN/NaT values
# under certain label index using  'subset' attribute
df = df.dropna(subset=[3], axis=1)
  
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
  
df


输出:

蟒蛇3

# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
  
df

输出:

在上面的示例中,我们删除了 'August' 和 'September' 列,因为它们包含 Nan 和 NaT 值。

示例 2:删除具有任何 NaN/NaT 值的所有列,然后使用df.reset_index()函数重置索引。

蟒蛇3

# Importing libraries
import pandas as pd
import numpy as np
  
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000], 
               [np.nan, 36, 74, np.nan],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000], 
               [pd.NaT, 39, 100, np.nan],
               [np.nan, 33, 90.5, 7028000],
               ['K.Peterson', 42, 85, pd.NaT]]
  
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age', 
                                        'Weight', 'Salary'])
  
df

输出:

蟒蛇3

# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
  
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
  
df

输出:

在上面的例子中,我们删除列 'Name' 和 'Salary' 然后重置索引。

示例 3:

蟒蛇3

# Importing libraries
import pandas as pd
import numpy as np
  
# creating and initializing a nested list
age_list = [[np.nan, 1952, 8425333, np.nan, 28.35], 
            ['Australia', 1957, 9712569, 'Oceania', 24.26],
            ['Brazil', 1962, 76039390, np.nan, 30.24],
            [pd.NaT, 1957, 637408000, 'Asia', 28.32], 
            ['France', 1957, 44310863, pd.NaT, 25.21],
            ['India', 1952, 3.72e+08, pd.NaT, 27.36], 
            ['United States', 1957, 171984000, 'Americas', 28.98]]
  
# creating a pandas dataframe
df = pd.DataFrame(age_list, columns=[
                  'Country', 'Year', 'Population', 'Continent', 'lifeExp'])
  
df

输出:

蟒蛇3

# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
  
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
  
df

输出:

在上面的示例中,我们删除了“Country”和“Continent”列,因为它们包含 Nan 和 NaT 值。

示例 4:使用“子集”属性删除某个标签索引下具有任何 NaN/NaT 值的所有列。

蟒蛇3

# Importing libraries 
import pandas as pd 
import numpy as np 
  
# Creating a dictionary 
dit = {'August': [10, np.nan, 34, 4.85, 71.2, 1.1], 
       'September': [np.nan, 54, 68, 9.25, pd.NaT, 0.9], 
        'October': [np.nan, 5.8, 8.52, np.nan, 1.6, 11],
       'November': [pd.NaT, 5.8, 50, 8.9, 77, pd.NaT] }
  
# Converting it to data frame
df = pd.DataFrame(data=dit)
  
# data frame
df

输出:

蟒蛇3

# Dropping the columns having NaN/NaT values
# under certain label index using  'subset' attribute
df = df.dropna(subset=[3], axis=1)
  
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
  
df

输出:

在上面的例子中,我们使用子集属性删除索引为 3 的列,即 'October'。