Prediction Analysis of Microarrays PAM

A straightforward approach to classification is the nearest centroid classifier. This computes, for each class, a centroid given by the average expression levels of the samples in the class, and then assigns new samples to the class whose centroid is nearest. This approach is similar to ^-means clustering except clusters are now replaced by known classes. With a large number of genes this algorithm can be sensitive to noise. A recent enhancement uses shrinkage: for each gene, differences between class centroids are set to zero if they are deemed likely to be due to chance. This approach is implemented in the Prediction Analysis of Microarray, or PAM (49), software. Shrinkage is controlled by a threshold below which differences are considered noise. Genes that show no difference above the noise level are removed. A threshold can be chosen by cross-validation, as shown in Figure 19.3 for the Hendefalk data. High thresholds, on the right, include few genes, and lead to classifiers that are prone to errors. As the threshold is decreased more genes are included and estimated classification errors decrease, until they reach a bottom and start climbing again as a result of noise genes - a phenomenon known as overfitting.

0 0

Post a comment