Kmeans clustering and selforganizing maps

K-means clustering (17) partitions objects into groups that have little variability within clusters and large variability across clusters. The user is required to specify the number k of clusters a priori. Estimation is iterative, starting with a random allocation of objects to clusters, re-allocating to minimize distance to the estimated 'centroids' of the clusters, and stopping when no improvements can be made. The centroid is the point whose attributes take the mean expression level of the objects in the clusters. K-medoids clustering is similar, except that the center of the clusters is defined by 'medoids', similar to centroids, but based on medians (4). Specification of k can be difficult, though there are ways of gaining insight into the appropriate number of clusters, such as using principal components analysis. A closely related approach is that of self-organizing maps (7, 15, 18), now common in gene expression data (16).

0 0

Post a comment