The majority of the available classification systems focus on the minimization of the classification error rate. This is not always a suitable metric specially when dealing with two-class problems with skewed classes and cost distributions. In this case, an effective criterion to measure the quality of a decision rule is the area under the Receiver Operating Characteristic curve (AUC) that is also useful to measure the ranking quality of a classifier as required in many real applications. In this paper we propose a nonparametric linear classifier based on the maximization of AUC. The approach lies on the analysis of the Wilcoxon–Mann–Whitney statistic of each single feature and on an iterative pairwise coupling of the features for the optimization of the ranking of the combined feature. By the pairwise feature evaluation the proposed procedure is essentially different from other classifiers using AUC as a criterion. Experiments performed on synthetic and real data sets and comparisons with previous approaches confirm the effectiveness of the proposed method.
Maximizing the Area Under the ROC Curve by Pairwise Feature Combination
MARROCCO, Claudio;TORTORELLA, Francesco
2008-01-01
Abstract
The majority of the available classification systems focus on the minimization of the classification error rate. This is not always a suitable metric specially when dealing with two-class problems with skewed classes and cost distributions. In this case, an effective criterion to measure the quality of a decision rule is the area under the Receiver Operating Characteristic curve (AUC) that is also useful to measure the ranking quality of a classifier as required in many real applications. In this paper we propose a nonparametric linear classifier based on the maximization of AUC. The approach lies on the analysis of the Wilcoxon–Mann–Whitney statistic of each single feature and on an iterative pairwise coupling of the features for the optimization of the ranking of the combined feature. By the pairwise feature evaluation the proposed procedure is essentially different from other classifiers using AUC as a criterion. Experiments performed on synthetic and real data sets and comparisons with previous approaches confirm the effectiveness of the proposed method.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.