Evaluation of classifiers

Classifiers based on gene expression are generally probabilistic, that is they only predict that a certain percentage of the individuals that have a given expression profile will also have the phenotype, or outcome, of interest. Therefore, statistical validation is necessary before models can be employed, especially in clinical settings (44, 45).

The most satisfactory approaches to validation require the use of data other than those used to develop the classifier. When only a single study is available, this can often be achieved by setting aside samples for validation purposes, as illustrated by Dudoit et al. (36). Statistical validation of probabilistic models (46) should focus on both refinement, that is, the ability of the classifier to discriminate between classes, and calibration, that is, the correspondence between the fraction predicted and the fraction observed in the validation sample.

An alternative to setting aside samples for validation is the so-called cross-validation. For example, K-fold cross-validation consists of splitting the data in K subsets, and training the classifier K times, setting aside each subset in turn for validation. The average classification rates in the K analyses is then an unbiased estimate of the correct classification rate (47).

A potentially serious mistake is to evaluate classifiers on the same data that were used for training. When the number of predictors is very large, a relatively large number of predictors will appear to be highly correlated with the phenotype of interest as a result of the random variation present in the data. These spurious predictors have no biological foundation and do not generally reproduce outside of the sample studied. As a result, evaluation of classifiers on training data tends to give overly optimistic assessments of validity. In plausible settings, classifiers can appear to have a near perfect classification ability in the training set without having any biological relation with phenotype (48). All aspects of learning a classifier need to be properly cross-validated to avoid inflated estimates of performance.

0 0

Post a comment