先决条件:目标=”_blank”>分类入门
在本文中,我们将讨论一种计算二元分类器效率的方法。假设存在问题,我们必须对属于 A 类或 B 类的产品进行分类。
让我们定义几个统计参数:
TP (True Positive) = number of Class A products, which are classified as Class A products.
FN (False Negative) = number of Class A products, which are classified as Class B products.
TN (True Negative) = number of Class B products, which are classified as Class B products.
FP (False Positive) = number of Class B products, which are classified as Class A products.
FP = N-TP; // where number N is the number of class A type products
FN = M-TN; // where number M is the number of class B type products
我们将看这个例子,以更好地理解这些参数。
如果(+)表示适合 Job 的候选人, (-)表示不适合 Job 的候选人。
要计算分类器的效率,我们需要计算Sensitivity、Specificity和Accuracy 的值。
Sensitivity measures the proportion of positives that are correctly identified as such.
Also known as True positive rate(TPR).
Specificity measures the proportion of negatives that are correctly identified as such.
Also known as True negative rate(TNR).
Accuracy measures how well the test predicts both TPR and TNR.
Sensitivity = ( TP / (TP+FN) ) * 100;
Specificity = ( TN/(TN+FP) ) * 100;
Accuracy = ( (TP+TN) / (TP+TN+FP+FN) ) * 100;
Efficiency = ( Sensitivity + Specificity + Accuracy ) / 3;
让我们以上面的例子并计算选择的效率:
假设合适的候选人属于 A 类,不合适的候选人属于 B 类。
Before Interview : N = 4 and M = 4
After Interview :
TP = 2
TN = 2
FP = N - TP = 2
FN = M - TN = 2
Sensitivity = 2/(2+2)*100 = 50
Specificity = 2/(2+2)*100 = 50
Accuracy = (2+2)/(2+2+2+2)*100 = 50
Efficiency = (50+50+50)/3 = 50
So,Efficiency of selection of candidates is 50% accurate.
其他表现措施:
- 错误率= (FP + FN) / (TP + TN + FP + FN)
- 精度= TP / (TP + FP)
- 召回率= TP / (TP + FN)
- BCR(平衡分类率) = 1/2* (TP / (TP + FN) + TN / (TN + FP))
- AUC = ROC 曲线下面积
接收器操作特性曲线:
- 接收器操作特性(ROC) 曲线:由分类算法的一个参数参数化的二维曲线。
- AUC 始终介于 0 和 1 之间。
- 可以通过在 y 轴上绘制 TPR 和在 x 轴上绘制 TNR 来获得 ROC 曲线。
- AUC 给出了所提出模型的准确性。
参考:
- https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers
- http://www.lifenscience.com/bioinformatics/sensivity-specificity-accuracy-and