📜  Python的sklearn.Binarizer()

📅  最后修改于: 2021-04-17 09:59:21             🧑  作者: Mango

sklearn.preprocessing.Binarizer()是一种属于预处理模块的方法。它在离散连续特征值中起关键作用。

范例1:
一个8位灰度图像的像素值的连续数据的值范围在0(黑色)和255(白色)之间,一个需要为黑白。因此,使用Binarizer()可以设置一个阈值,将像素值从0到127转换为0和128到255转换为1。

范例2:
一个人的机器记录以“成功率”为特征。这些值是连续的,范围从10%到99%,但是研究人员只是想使用此数据基于其他给定参数来预测机器的通过或失败状态。

句法 :

sklearn.preprocessing.Binarizer(threshold, copy)

返回 :

Binarized Feature values

下载数据集:
转到链接并下载Data.csv

下面是解释sklearn.Binarizer()的Python代码

# Python code explaining how
# to Binarize feature values
   
""" PART 1
    Importing Libraries """
   
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
  
# Sklearn library 
from sklearn import preprocessing
  
""" PART 2
    Importing Data """
   
data_set = pd.read_csv(
        'C:\\Users\\dell\\Desktop\\Data_for_Feature_Scaling.csv')
data_set.head()
  
# here Features - Age and Salary columns 
# are taken using slicing
# to binarize values
age = data_set.iloc[:, 1].values
salary = data_set.iloc[:, 2].values
print ("\nOriginal age data values : \n",  age)
print ("\nOriginal salary data values : \n",  salary)
  
""" PART 4
    Binarizing values """
  
from sklearn.preprocessing import Binarizer
  
x = age
x = x.reshape(1, -1)
y = salary
y = y.reshape(1, -1)
  
# For age, let threshold be 35
# For salary, let threshold be 61000
binarizer_1 = Binarizer(35)
binarizer_2 = Binarizer(61000)
  
# Transformed feature
print ("\nBinarized age : \n", binarizer_1.fit_transform(x))
  
print ("\nBinarized salary : \n", binarizer_2.fit_transform(y))

输出 :

Country  Age  Salary  Purchased
0   France   44   72000          0
1    Spain   27   48000          1
2  Germany   30   54000          0
3    Spain   38   61000          0
4  Germany   40    1000          1

Original age data values : 
 [44 27 30 38 40 35 78 48 50 37]

Original salary data values : 
 [72000 48000 54000 61000  1000 58000 52000 79000 83000 67000]

Binarized age : 
 [[1 0 0 1 1 0 1 1 1 1]]

Binarized salary : 
 [[1 0 0 0 0 0 0 1 1 1]]