📌  相关文章
📜  在Python中使用 Pandas 替换缺失值

📅  最后修改于: 2022-05-13 01:54:40.719000             🧑  作者: Mango

在Python中使用 Pandas 替换缺失值

数据集是属性和行的集合。数据集可能有缺失的数据,在Python中用 NA 表示,在本文中,我们将替换本文中的缺失值

我们考虑这个数据集:Dataset

数据集

在我们的数据中包含数量、价格、购买、上午和下午列中的缺失值,

因此,我们可以将数量列中的缺失值替换为均值,将价格列替换为中位数,将购买列替换为标准差。 Forenoon 列在该列中具有最小值。该列中具有最大值的下午列。

方法:

  • 导入模块
  • 加载数据集
  • 填写缺失值
  • 验证数据集

句法:

下面是实现:

Python3
# importing pandas module
import pandas as pd
  
# loading data set
data = pd.read_csv('item.csv')
  
# display the data
print(data)


Python3
# replacing missing values in quantity
# column with mean of that column
data['quantity'] = data['quantity'].fillna(data['quantity'].mean())
  
# replacing missing values in price column
# with median of that column
data['price'] = data['price'].fillna(data['price'].median())
  
# replacing missing values in bought column with
# standard deviation of that column
data['bought'] = data['bought'].fillna(data['bought'].std())
  
# replacing missing values in forenoon  column with
# minimum number of that column
data['forenoon'] = data['forenoon'].fillna(data['forenoon'].min())
  
# replacing missing values in afternoon  column with 
# maximum number of that column
data['afternoon'] = data['afternoon'].fillna(data['afternoon'].max())
  
print(Data)


输出:

然后我们将继续用平均值、中值、众数、标准差、最小值和最大值替换缺失值

蟒蛇3

# replacing missing values in quantity
# column with mean of that column
data['quantity'] = data['quantity'].fillna(data['quantity'].mean())
  
# replacing missing values in price column
# with median of that column
data['price'] = data['price'].fillna(data['price'].median())
  
# replacing missing values in bought column with
# standard deviation of that column
data['bought'] = data['bought'].fillna(data['bought'].std())
  
# replacing missing values in forenoon  column with
# minimum number of that column
data['forenoon'] = data['forenoon'].fillna(data['forenoon'].min())
  
# replacing missing values in afternoon  column with 
# maximum number of that column
data['afternoon'] = data['afternoon'].fillna(data['afternoon'].max())
  
print(Data)

输出: