Here is an application that illustrates different roads to different gene lists starting from the same data. The data set is by Golub et al. (11). Samples from 47 patients with acute lymphoblastic leukemia (ALL) and 25 patients with acute myeloid leukemia (AML) are hybridized to Affymetrix© HU6800 microarrays coding for 7129 transcripts. The R package golubEsets contains a slightly transformed version of this data set in the variable golubMerge. We normalize with vsn (3) and compute absolute scores as described above.

library(golubEsets); library(vsn) golubNorm <- vsn(exprs(golubMerge)) X <- exprs(golubNorm) y <- as.numeric(golubMerge$ALL.AML)-1

Note that twilight.pval returns scores in the first column of a matrix called result. Genes are ordered by empirical p-values. To return the original order type:

ttest <- twilight.pval(X,y,method="t",paired=FALSE) score <- ttest$result[genes,1]

Table 18.1 displays the ranks of genes ranked highest by t-score and selected ranks. The first four columns contain ranks of t-like scores. Going from left to right, the scoring methods put more weight on the difference in means and less weight on a gene's variance (recall that the log ratio score is simply the difference in means). Ranks of genes with small differences and small variances increase going from t-test to log ratio, for example genes CD33 and MLP. Ranks of genes with larger differences but large variances decrease, for example genes FCER1G and SPI1. The comparison between t and Wilcoxon ranks highlights which scores are confounded with outlying expression values, for example gene DF. The last column contains ranks of combined pAUC-scores which in this example lead to quite similar results.

0 0

Post a comment