📅  最后修改于: 2023-12-03 14:47:27.614000             🧑  作者: Mango
SimpleImputer
是一个用于填充缺失值的类。它可以在不改变原始数据的情况下,对缺失值进行处理。
使用SimpleImputer
类需要导入sklearn.impute模块。SimpleImputer
类可以选择传入四个参数。
| 参数 | 描述 | |---|---| | missing_values | 要填充的缺失值,默认为np.nan | | strategy | 如何填充缺失值,可以选择"mean", "median", "most_frequent", "constant"四种方式,默认为"mean" | | fill_value | 如果strategy选择"constant",则用这个值来填充 | | verbose | 是否输出填充日志 |
from sklearn.impute import SimpleImputer
# 创建一个填充器,使用平均值填充缺失值
imputer = SimpleImputer(strategy="mean")
# 填充数据
X_filled = imputer.fit_transform(X)
使用pandas读取一个有缺失值的csv文件,并使用SimpleImputer
填充缺失值。
import pandas as pd
from sklearn.impute import SimpleImputer
# 读取CSV文件,并查找缺失值
df = pd.read_csv("data.csv")
print(df.isnull().sum())
# 创建一个填充器,使用平均值填充缺失值
imputer = SimpleImputer(strategy="mean")
# 填充缺失值
df_filled = imputer.fit_transform(df)
# 打印填充好的数据
print(df_filled)
输出结果:
area 1
price 4
rooms 2
bathrooms 2
distance_city 1
dtype: int64
[[2.00000000e+02 1.40000000e+05 7.00000000e+00 2.00000000e+00
1.00000000e+01]
[3.50000000e+02 2.10000000e+05 8.00000000e+00 3.00000000e+00
1.50000000e+01]
[4.30000000e+02 2.40000000e+05 9.00000000e+00 2.00000000e+00
2.00000000e+01]
[2.00000000e+02 1.50000000e+05 5.00000000e+00 2.00000000e+00
1.00000000e+01]
[3.00000000e+02 1.80000000e+05 7.00000000e+00 2.00000000e+00
1.50000000e+01]
[4.30000000e+02 2.40000000e+05 9.00000000e+00 3.00000000e+00
2.00000000e+01]
[2.50000000e+02 1.70000000e+05 6.00000000e+00 2.00000000e+00
1.50000000e+01]
[2.20000000e+02 1.60000000e+05 5.00000000e+00 2.00000000e+00
1.50000000e+01]
[3.20000000e+02 2.00000000e+05 7.00000000e+00 3.00000000e+00
2.00000000e+01]
[2.50000000e+02 1.80000000e+05 6.00000000e+00 2.00000000e+00
1.50000000e+01]
[3.20000000e+02 2.00000000e+05 7.00000000e+00 3.00000000e+00
2.00000000e+01]
[2.50000000e+02 1.50000000e+05 5.00000000e+00 2.00000000e+00
1.50000000e+01]
[4.00000000e+02 1.80000000e+05 6.00000000e+00 2.00000000e+00
2.00000000e+01]
[3.00000000e+02 2.20000000e+05 7.00000000e+00 3.00000000e+00
2.00000000e+01]
[4.00000000e+02 1.80000000e+05 6.00000000e+00 2.00000000e+00
2.00000000e+01]
[2.00000000e+02 1.20000000e+05 5.00000000e+00 2.00000000e+00
1.00000000e+01]
[2.50000000e+02 1.50000000e+05 7.00000000e+00 2.00000000e+00
1.50000000e+01]
[3.00000000e+02 1.60000000e+05 7.00000000e+00 3.00000000e+00
2.00000000e+01]
[4.00000000e+02 1.80000000e+05 7.00000000e+00 2.00000000e+00
1.00000000e+01]
[2.00000000e+02 1.50000000e+05 5.00000000e+00 2.00000000e+00
1.00000000e+01]]