Python|使用 KNNimputer() 进行插补

KNNimputer是一个 scikit-learn 类，用于填充或预测数据集中的缺失值。它是一种更有用的方法，它适用于 KNN 算法的基本方法，而不是用平均值或中值填充所有值的幼稚方法。在这种方法中，我们指定与缺失值的距离，也称为 K 参数。将参考邻居的平均值来预测缺失值。

它由包含以下参数的KNNimputer()方法实现：

n_neighbors: number of data points to include closer to the missing value.
metric: the distance metric to be used for searching.
values – {nan_euclidean. callable} by default – nan_euclidean
weights: to determine on what basis should the neighboring values be treated
values -{uniform , distance, callable} by default- uniform.

编程需要懂一点英语

代码：说明 KNNimputor 类的Python代码

# import necessary libraries
import numpy as np
import pandas as pd
  
# import the KNNimputer class
from sklearn.impute import KNNImputer
  
  
# create dataset for marks of a student
dict = {'Maths':[80, 90, np.nan, 95], 
        'Chemistry': [60, 65, 56, np.nan], 
        'Physics':[np.nan, 57, 80, 78],
       'Biology' : [78,83,67,np.nan]}
  
# creating a data frame from the list 
Before_imputation = pd.DataFrame(dict)
#print dataset before imputaion
print("Data Before performing imputation\n",Before_imputation)
  
# create an object for KNNImputer
imputer = KNNImputer(n_neighbors=2)
After_imputation = imputer.fit_transform(Before_imputation)
# print dataset after performing the operation
print("\n\nAfter performing imputation\n",After_imputation)

输出：

Data Before performing imputation
    Maths  Chemistry  Physics  Biology
0   80.0       60.0      NaN     78.0
1   90.0       65.0     57.0     83.0
2    NaN       56.0     80.0     67.0
3   95.0        NaN     78.0      NaN


After performing imputation
 [[80.  60.  68.5 78. ]
 [90.  65.  57.  83. ]
 [87.5 56.  80.  67. ]
 [95.  58.  78.  72.5]]

注意：转换后的数据变成了一个 numpy 数组。