sklearn.preprocessing.Binarizer()是一种属于预处理模块的方法。它在离散连续特征值中起关键作用。
范例1:
一个8位灰度图像的像素值的连续数据的值范围在0(黑色)和255(白色)之间,一个需要为黑白。因此,使用Binarizer()
可以设置一个阈值,将像素值从0到127转换为0和128到255转换为1。
范例2:
一个人的机器记录以“成功率”为特征。这些值是连续的,范围从10%到99%,但是研究人员只是想使用此数据基于其他给定参数来预测机器的通过或失败状态。
句法 :
sklearn.preprocessing.Binarizer(threshold, copy)
Parameters :
threshold :[float, optional] Values less than or equal to threshold is mapped to 0, else to 1. By default threshold value is 0.0.
copy :[boolean, optional] If set to False, it avoids a copy. By default it is True.
返回 :
Binarized Feature values
下载数据集:
转到链接并下载Data.csv
下面是解释sklearn.Binarizer()的Python代码
# Python code explaining how
# to Binarize feature values
""" PART 1
Importing Libraries """
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Sklearn library
from sklearn import preprocessing
""" PART 2
Importing Data """
data_set = pd.read_csv(
'C:\\Users\\dell\\Desktop\\Data_for_Feature_Scaling.csv')
data_set.head()
# here Features - Age and Salary columns
# are taken using slicing
# to binarize values
age = data_set.iloc[:, 1].values
salary = data_set.iloc[:, 2].values
print ("\nOriginal age data values : \n", age)
print ("\nOriginal salary data values : \n", salary)
""" PART 4
Binarizing values """
from sklearn.preprocessing import Binarizer
x = age
x = x.reshape(1, -1)
y = salary
y = y.reshape(1, -1)
# For age, let threshold be 35
# For salary, let threshold be 61000
binarizer_1 = Binarizer(35)
binarizer_2 = Binarizer(61000)
# Transformed feature
print ("\nBinarized age : \n", binarizer_1.fit_transform(x))
print ("\nBinarized salary : \n", binarizer_2.fit_transform(y))
输出 :
Country Age Salary Purchased
0 France 44 72000 0
1 Spain 27 48000 1
2 Germany 30 54000 0
3 Spain 38 61000 0
4 Germany 40 1000 1
Original age data values :
[44 27 30 38 40 35 78 48 50 37]
Original salary data values :
[72000 48000 54000 61000 1000 58000 52000 79000 83000 67000]
Binarized age :
[[1 0 0 1 1 0 1 1 1 1]]
Binarized salary :
[[1 0 0 0 0 0 0 1 1 1]]