如何删除 Pandas DataFrame 中具有 NaN 值的列?
Nan(不是数字)是一个浮点值,不能转换为其他数据类型,期望浮点数。在数据分析中,Nan 是不必要的值,为了正确分析数据集必须删除它。在本文中,我们将讨论如何删除/删除 Pandas 数据框中具有 Nan 值的列。我们有一个名为Pandas.DataFrame.dropna()的函数 删除具有 Nan 值的列。
Syntax: DataFrame.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False)
示例 1:删除具有任何 NaN/NaT 值的所有列。
Python3
# Importing libraries
import pandas as pd
import numpy as np
# Creating a dictionary
dit = {'August': [pd.NaT, 25, 34, np.nan, 1.1, 10],
'September': [4.8, pd.NaT, 68, 9.25, np.nan, 0.9],
'October': [78, 5.8, 8.52, 12, 1.6, 11], }
# Converting it to data frame
df = pd.DataFrame(data=dit)
# DataFrame
df
Python3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
df
Python3
# Importing libraries
import pandas as pd
import numpy as np
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
[np.nan, 36, 74, np.nan],
['V.Kholi', 31, 70, 8428000],
['S.Smith', 34, 80, 4428000],
[pd.NaT, 39, 100, np.nan],
[np.nan, 33, 90.5, 7028000],
['K.Peterson', 42, 85, pd.NaT]]
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age',
'Weight', 'Salary'])
df
Python3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
df
Python3
# Importing libraries
import pandas as pd
import numpy as np
# creating and initializing a nested list
age_list = [[np.nan, 1952, 8425333, np.nan, 28.35],
['Australia', 1957, 9712569, 'Oceania', 24.26],
['Brazil', 1962, 76039390, np.nan, 30.24],
[pd.NaT, 1957, 637408000, 'Asia', 28.32],
['France', 1957, 44310863, pd.NaT, 25.21],
['India', 1952, 3.72e+08, pd.NaT, 27.36],
['United States', 1957, 171984000, 'Americas', 28.98]]
# creating a pandas dataframe
df = pd.DataFrame(age_list, columns=[
'Country', 'Year', 'Population', 'Continent', 'lifeExp'])
df
Python3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
df
Python3
# Importing libraries
import pandas as pd
import numpy as np
# Creating a dictionary
dit = {'August': [10, np.nan, 34, 4.85, 71.2, 1.1],
'September': [np.nan, 54, 68, 9.25, pd.NaT, 0.9],
'October': [np.nan, 5.8, 8.52, np.nan, 1.6, 11],
'November': [pd.NaT, 5.8, 50, 8.9, 77, pd.NaT] }
# Converting it to data frame
df = pd.DataFrame(data=dit)
# data frame
df
Python3
# Dropping the columns having NaN/NaT values
# under certain label index using 'subset' attribute
df = df.dropna(subset=[3], axis=1)
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
df
输出:
蟒蛇3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
df
输出:
在上面的示例中,我们删除了 'August' 和 'September' 列,因为它们包含 Nan 和 NaT 值。
示例 2:删除具有任何 NaN/NaT 值的所有列,然后使用df.reset_index()函数重置索引。
蟒蛇3
# Importing libraries
import pandas as pd
import numpy as np
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
[np.nan, 36, 74, np.nan],
['V.Kholi', 31, 70, 8428000],
['S.Smith', 34, 80, 4428000],
[pd.NaT, 39, 100, np.nan],
[np.nan, 33, 90.5, 7028000],
['K.Peterson', 42, 85, pd.NaT]]
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age',
'Weight', 'Salary'])
df
输出:
蟒蛇3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
df
输出:
在上面的例子中,我们删除列 'Name' 和 'Salary' 然后重置索引。
示例 3:
蟒蛇3
# Importing libraries
import pandas as pd
import numpy as np
# creating and initializing a nested list
age_list = [[np.nan, 1952, 8425333, np.nan, 28.35],
['Australia', 1957, 9712569, 'Oceania', 24.26],
['Brazil', 1962, 76039390, np.nan, 30.24],
[pd.NaT, 1957, 637408000, 'Asia', 28.32],
['France', 1957, 44310863, pd.NaT, 25.21],
['India', 1952, 3.72e+08, pd.NaT, 27.36],
['United States', 1957, 171984000, 'Americas', 28.98]]
# creating a pandas dataframe
df = pd.DataFrame(age_list, columns=[
'Country', 'Year', 'Population', 'Continent', 'lifeExp'])
df
输出:
蟒蛇3
# Dropping the columns having NaN/NaT values
df = df.dropna(axis=1)
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
df
输出:
在上面的示例中,我们删除了“Country”和“Continent”列,因为它们包含 Nan 和 NaT 值。
示例 4:使用“子集”属性删除某个标签索引下具有任何 NaN/NaT 值的所有列。
蟒蛇3
# Importing libraries
import pandas as pd
import numpy as np
# Creating a dictionary
dit = {'August': [10, np.nan, 34, 4.85, 71.2, 1.1],
'September': [np.nan, 54, 68, 9.25, pd.NaT, 0.9],
'October': [np.nan, 5.8, 8.52, np.nan, 1.6, 11],
'November': [pd.NaT, 5.8, 50, 8.9, 77, pd.NaT] }
# Converting it to data frame
df = pd.DataFrame(data=dit)
# data frame
df
输出:
蟒蛇3
# Dropping the columns having NaN/NaT values
# under certain label index using 'subset' attribute
df = df.dropna(subset=[3], axis=1)
# Resetting the indices using df.reset_index()
df = df.reset_index(drop=True)
df
输出:
在上面的例子中,我们使用子集属性删除索引为 3 的列,即 'October'。