lymphoma subtypes

expression profiles of multiple tissue samples, one can form ''classes.'' Three distinct class methods are used in microarray studies, discussed in the following paragraph. Under certain circumstances, the gene expression profile of a tissue can be referred to as a gene expression signature. Furthermore, like an ordinary handwritten signature, this gene expression signature must be reproducible and unique to a tissue under defined circumstances. For its practical use, some method of reliably ''reading'' this expression signature is also required. A study reporting ''gene expression signatures" of a disease must establish all three features of reproducibility of the expression profiles, uniqueness of the profile to the disease, and a reliable method to ''read'' this profile to determine whether it is indeed different than other expression profiles.

By performing multiple microarray experiments on tissue samples from patients with different diseases, one can combine data into data sets with each row representing a gene and each column a microarray experiment, with the number values for row i and column j being the expression level measured for gene i in tissue j. Microarray-based disease classification is essentially the study of such data sets and methods to organize the columns (individual tissue microarray experiments) into classes with meaningful clinical distinctions (Fig. 1). The three distinct classification methods—comparison, prediction, and discovery—differ principally in their goals and are discussed next in turn.

In class comparison analysis, the data set consists of ''labeled'' specimens, each of which has a predefined class assignment, and the goal is to understand whether different classes have different expression profiles and to compare and contrast the gene expression differences among the classes. An example is analysis of a gene expression data set consisting of individual gene expression profiles from each of a number of individuals with distinct myopathies (Greenberg et al., 2002). In class prediction analysis, classes are predefined, but the goal is to use the data to build a model to predict the correct class assignment of a new sample (Fig. 2). Such models

Fig. 1. Schematic approach to clustering of tissue samples based on similarity of gene expression patterns. Tissue specimens with similar expression patterns are clustered on the right. (See Color Insert.)

typically take the form of a multivariate function of any number of gene expression measurements but can be as simple as a single threshold gene expression value. In class discovery, the goal is to examine the gene expression correlates of heterogeneity within a single class (Fig. 3). This is fundamentally different than class comparison and prediction approaches. Instead, subgroups of patterns of gene expression are sought, so samples within a subgroup are sufficiently similar to each other and sufficiently distinct from the rest of the group. Class discovery has as its goal the establishment of previously unrecognized subgroups of gene expression profiles that also have important and clinically relevant phenotypical accompaniments. Because disease heterogeneity is nearly universal and poses significant difficulties in the management of individual patients, class discovery has great potential for immediate contributions to the daily practice of medicine.

Computational approaches to disease classification include cluster analysis and supervised learning techniques (Kohane et al., 2002; Quackenbush, 2001). Cluster analysis partitions expression data into groups using a measure of similarity and an organization structure that represents this similarity. Cluster analysis methods differ in the similarity measure chosen (e.g., Euclidean distance, Pearson

Fig. 2. Class prediction: building a predictor. (1) From data, choose a ''gene set'' that will discriminate among classes. (2) Choose a prediction function that when applied to a new expression profile produces a real number. (3) Choose a prediction rule that classifies a sample based on the output of the prediction function after application to it. (4) Validate the model.

Fig. 2. Class prediction: building a predictor. (1) From data, choose a ''gene set'' that will discriminate among classes. (2) Choose a prediction function that when applied to a new expression profile produces a real number. (3) Choose a prediction rule that classifies a sample based on the output of the prediction function after application to it. (4) Validate the model.

- Explore phenotypic variations within these defined classes for a variety of phenotypic

• Time-to-endpoint (i.e., survival, treatment failure)

• Categorical, continuous variables (i.e., lab results)

- Establish clinically meaningful phenotypic

Fig. 3. Class discovery.

correlation coefficient, and mutual information) (Greenberg, 2001b), the organizational structure used to represent the partition [e.g., hierarchical clustering produces a dendrogram or treelike structure (Eisen et al., 1998), k-means produce groups in a multidimensional surface, relevance networks (Butte et al., 2000) produce graphs with connected subgraphs), and the algorithm used to achieve this partition (e.g., single linkage, average linkage]. Hierarchical cluster analysis is widely used in published microarray studies for class comparison and discovery (Alizadeh et al., 2000; Bittner et al., 2000; Greenberg et al., 2002; Hedenfalk et al., 2001; Nielsen et al., 2002; Perou et al., 2000; Sorlie et al., 2001; van't Veer et al., 2002). The approach with class discovery is to perform hierarchical classification on tissues from a single class and to examine the resulting tree structure for natural divisions that might reflect subclasses. One then needs to find a meaningful phenotypical difference that is statistically different between the subclasses, such as survival times.

Supervised learning techniques are used for class prediction (Golub et al., 1999; Radmacher et al., 2002; Simon et al., 2003). Data are used to construct a model (''predictor'') that predicts the proper class when a new sample is presented to it. Construction of such a predictor requires the choice of a subset of ''informative'' genes whose variability among classes is essential, a multivariate prediction function combining these measurements of informative genes, and a prediction rule stating which values of the prediction function define a sample into a particular class. Methods of choosing informative genes (e.g., classification trees, correlation coefficients), creating predictor functions (e.g., linear weighting functions) (Golub et al., 1999), support vector machines (Brown et al., 2000; Ramaswamy et al., 2001), neural networks (Khan et al., 2001), and choices of prediction rules (e.g., threshold values for optimal sensitivity and specificity) account for significant variability with supervised methods. The principal challenge with supervised methods for class prediction is avoiding overfitting of the data, which typically results in good performance on the data set from which it was constructed but poor performance as a predictor for new data. Although class prediction through supervised learning has an enormous potential impact on clinical medicine, it is important to realize that important papers in this area often contain seriously flawed analyses that in our opinion are not yet appropriate for application to clinical medicine. The lack of technical expertise among investigators and journal reviewers remains a serious problem in this field (Simon et al., 2003). For example, the Netherlands Cancer Institute in Amsterdam is reportedly using expression levels of 70 genes from microarray-generated tumor profiles of patients with breast cancer together with a class prediction model (van de Vijver et al., 2002) to determine which women will receive adjuvant treatment after surgery (Schubert, 2003). Although these investigators' model had a reported accuracy of 73% for prediction of breast cancer outcome based on gene expression profiles, the methodology was biased by incomplete performance of cross-validation techniques; the unbiased estimate of accuracy was in fact 59%, marginally better than a coin flip alone (Simon et al., 2003).

0 0

Post a comment