在Python中使用 Pandas 替换缺失值
数据集是属性和行的集合。数据集可能有缺失的数据,在Python中用 NA 表示,在本文中,我们将替换本文中的缺失值
我们考虑这个数据集:Dataset
在我们的数据中包含数量、价格、购买、上午和下午列中的缺失值,
因此,我们可以将数量列中的缺失值替换为均值,将价格列替换为中位数,将购买列替换为标准差。 Forenoon 列在该列中具有最小值。该列中具有最大值的下午列。
方法:
- 导入模块
- 加载数据集
- 填写缺失值
- 验证数据集
句法:
Mean: data=data.fillna(data.mean())
Median: data=data.fillna(data.median())
Standard Deviation: data=data.fillna(data.std())
Min: data=data.fillna(data.min())
Max: data=data.fillna(data.max())
下面是实现:
Python3
# importing pandas module
import pandas as pd
# loading data set
data = pd.read_csv('item.csv')
# display the data
print(data)
Python3
# replacing missing values in quantity
# column with mean of that column
data['quantity'] = data['quantity'].fillna(data['quantity'].mean())
# replacing missing values in price column
# with median of that column
data['price'] = data['price'].fillna(data['price'].median())
# replacing missing values in bought column with
# standard deviation of that column
data['bought'] = data['bought'].fillna(data['bought'].std())
# replacing missing values in forenoon column with
# minimum number of that column
data['forenoon'] = data['forenoon'].fillna(data['forenoon'].min())
# replacing missing values in afternoon column with
# maximum number of that column
data['afternoon'] = data['afternoon'].fillna(data['afternoon'].max())
print(Data)
输出:
然后我们将继续用平均值、中值、众数、标准差、最小值和最大值替换缺失值
蟒蛇3
# replacing missing values in quantity
# column with mean of that column
data['quantity'] = data['quantity'].fillna(data['quantity'].mean())
# replacing missing values in price column
# with median of that column
data['price'] = data['price'].fillna(data['price'].median())
# replacing missing values in bought column with
# standard deviation of that column
data['bought'] = data['bought'].fillna(data['bought'].std())
# replacing missing values in forenoon column with
# minimum number of that column
data['forenoon'] = data['forenoon'].fillna(data['forenoon'].min())
# replacing missing values in afternoon column with
# maximum number of that column
data['afternoon'] = data['afternoon'].fillna(data['afternoon'].max())
print(Data)
输出: