📜  Scikit学习-使用朴素贝叶斯进行分类

📅  最后修改于: 2020-12-10 05:54:41             🧑  作者: Mango


贝叶斯定理陈述以下关系以便找到类的后验概率,即标签的概率和一些观察到的特征,$ P \ left(\ begin {array} {c} Y \ arrowvert features \ end {array} \ right )$。

$$ P \ left(\ begin {array} {c} Y \ arrowvert features \ end {array} \ right)= \ left(\ frac {P \ lgroup Y \ rgroup P \ left(\ begin {array} {c } features \ arrowvert Y \ end {array} \ right)} {P \ left(\ begin {array} {c} features \ end {array} \ right)} \ right)$$

在这里,$ P \ left(\ begin {array} {c} Y \ arrowvert features \ end {array} \ right)$是类的后验概率。

$ P \ left(\ begin {array} {c} Y \ end {array} \ right)$是类别的先验概率。

$ P \ left(\ begin {array} {c} features \ arrowvert Y \ end {array} \ right)$是可能性,这是给定类别的预测变量的概率。

$ P \ left(\ begin {array} {c} features \ end {array} \ right)$是预测变量的先验概率。

Scikit学习提供了不同的朴素贝叶斯分类器模型,即高斯,多项式,补码和伯努利。所有这些变量的主要区别在于它们对ð’·$ P \ left(\ begin {array} {c} features \ arrowvert Y \ end {array} \ right)$的分布所做的假设,即给定类别的预测变量的概率。

Sr.No Model & Description
1 Gaussian Naïve Bayes

Gaussian Naïve Bayes classifier assumes that the data from each label is drawn from a simple Gaussian distribution.

2 Multinomial Naïve Bayes

It assumes that the features are drawn from a simple Multinomial distribution.

3 Bernoulli Naïve Bayes

The assumption in this model is that the features binary (0s and 1s) in nature. An application of Bernoulli Naïve Bayes classification is Text classification with ‘bag of words’ model

4 Complement Naïve Bayes

It was designed to correct the severe assumptions made by Multinomial Bayes classifier. This kind of NB classifier is suitable for imbalanced data sets



Import Sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']
train, test, train_labels, test_labels = train_test_split(
   features,labels,test_size = 0.40, random_state = 42
from sklearn.naive_bayes import GaussianNB
GNBclf = GaussianNB()
model = GNBclf.fit(train, train_labels)
preds = GNBclf.predict(test)


   1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1
   1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 
   1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 0 
   1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 
   1 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 
   0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 
   1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 
   1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 
   1 1 1 1 0 1 0 0 1 1 0 1
