📜  SimpleImputer (1)

📅  最后修改于: 2023-12-03 14:47:27.614000             🧑  作者: Mango

SimpleImputer

SimpleImputer是一个用于填充缺失值的类。它可以在不改变原始数据的情况下,对缺失值进行处理。

使用方法

使用SimpleImputer类需要导入sklearn.impute模块。SimpleImputer类可以选择传入四个参数。

| 参数 | 描述 | |---|---| | missing_values | 要填充的缺失值,默认为np.nan | | strategy | 如何填充缺失值,可以选择"mean", "median", "most_frequent", "constant"四种方式,默认为"mean" | | fill_value | 如果strategy选择"constant",则用这个值来填充 | | verbose | 是否输出填充日志 |

from sklearn.impute import SimpleImputer

# 创建一个填充器,使用平均值填充缺失值
imputer = SimpleImputer(strategy="mean")

# 填充数据
X_filled = imputer.fit_transform(X)
实例分析

使用pandas读取一个有缺失值的csv文件,并使用SimpleImputer填充缺失值。

import pandas as pd
from sklearn.impute import SimpleImputer

# 读取CSV文件,并查找缺失值
df = pd.read_csv("data.csv")
print(df.isnull().sum())

# 创建一个填充器,使用平均值填充缺失值
imputer = SimpleImputer(strategy="mean")

# 填充缺失值
df_filled = imputer.fit_transform(df)

# 打印填充好的数据
print(df_filled)

输出结果:

area            1
price           4
rooms           2
bathrooms       2
distance_city   1
dtype: int64
[[2.00000000e+02 1.40000000e+05 7.00000000e+00 2.00000000e+00
  1.00000000e+01]
 [3.50000000e+02 2.10000000e+05 8.00000000e+00 3.00000000e+00
  1.50000000e+01]
 [4.30000000e+02 2.40000000e+05 9.00000000e+00 2.00000000e+00
  2.00000000e+01]
 [2.00000000e+02 1.50000000e+05 5.00000000e+00 2.00000000e+00
  1.00000000e+01]
 [3.00000000e+02 1.80000000e+05 7.00000000e+00 2.00000000e+00
  1.50000000e+01]
 [4.30000000e+02 2.40000000e+05 9.00000000e+00 3.00000000e+00
  2.00000000e+01]
 [2.50000000e+02 1.70000000e+05 6.00000000e+00 2.00000000e+00
  1.50000000e+01]
 [2.20000000e+02 1.60000000e+05 5.00000000e+00 2.00000000e+00
  1.50000000e+01]
 [3.20000000e+02 2.00000000e+05 7.00000000e+00 3.00000000e+00
  2.00000000e+01]
 [2.50000000e+02 1.80000000e+05 6.00000000e+00 2.00000000e+00
  1.50000000e+01]
 [3.20000000e+02 2.00000000e+05 7.00000000e+00 3.00000000e+00
  2.00000000e+01]
 [2.50000000e+02 1.50000000e+05 5.00000000e+00 2.00000000e+00
  1.50000000e+01]
 [4.00000000e+02 1.80000000e+05 6.00000000e+00 2.00000000e+00
  2.00000000e+01]
 [3.00000000e+02 2.20000000e+05 7.00000000e+00 3.00000000e+00
  2.00000000e+01]
 [4.00000000e+02 1.80000000e+05 6.00000000e+00 2.00000000e+00
  2.00000000e+01]
 [2.00000000e+02 1.20000000e+05 5.00000000e+00 2.00000000e+00
  1.00000000e+01]
 [2.50000000e+02 1.50000000e+05 7.00000000e+00 2.00000000e+00
  1.50000000e+01]
 [3.00000000e+02 1.60000000e+05 7.00000000e+00 3.00000000e+00
  2.00000000e+01]
 [4.00000000e+02 1.80000000e+05 7.00000000e+00 2.00000000e+00
  1.00000000e+01]
 [2.00000000e+02 1.50000000e+05 5.00000000e+00 2.00000000e+00
  1.00000000e+01]]