Typically, genes with large changes in gene expression levels over time relative to their replicate variances are best candidates for following up. However, given the thousands of genes in a microarray time course experiment, and the small number of replicates, the variances (in the case of cross-sectional data) or variance-covariance matrices (in the case of longitudinal data), are usually very poorly estimated. As a result, some genes which exhibit relatively small amounts of change over time and small replicate variances, may have large between-to-within time F-statistics because of these under-estimated denominators. We may conclude that such genes are changing over time, but if they are not, they will be false positives. For example, Jiang et al. (35) mentioned such genes. Figure 20.1 gives an example of such a gene from Tomancak et al. (8). The F-statistic for this gene has a higher ranking than many other genes exhibiting greater change in expression levels over time, so we consider it a false positive. On the other hand, some genes with large amounts of changes over time, but also large replicate variances, may have small F-statistics because of over-estimated denominators. Such genes may be false negatives, that is may be changing over time, but not be identified as such. The gene in Figure 20.2 is clearly changing over time, however, the expression level of experiment B at 10 h is much lower than those of experiments A and C. This single outlier leads to lower rankings for the F-statistic compared to all the other statistics mentioned below; however, this gene is very likely to be of interest. By moving (shrinking) gene-specific variances or variance-covariance matrices

w 10H

Figure 20.1.

longitudinal MB rank = 7613 cross-sectional MB rank = 6326 moderated F rank = 6210 F rank = 5033


A probable false positive gene (see text).

longitudinal MB rank = 36 cross-sectional MB rank = 3136 moderated F rank = 4157 F rank = 4732

Figure 20.2.


A probable false negative gene (see text).

towards a common value estimated from the whole gene set, the total number of false positives and false negatives can usually be reduced. This is what we term moderation.

The idea of moderation has entered into the analysis of microarray data in different forms. Efron et al. (21) and Tusher et al. (22) tuned the t-statis-tic by adding a suitable constant to the standard deviation, the constant being estimated by a percentile of sample standard deviations, by minimizing a coefficient of variation, respectively. Their approaches are not based on any distributional theory. Lonnstedt and Speed (24) brought the idea of moderation in the univariate hierarchical Bayesian mixture model for two-channel comparative experiments by smoothing gene-specific residual sample variances toward a common value. Smyth (29) formally introduced the moderated t-statistic in the univariate general linear model setting by substituting the denominator of t-statistic with a moderated denominator. The gene-specific moderated sample variance in Smyth (29) is defined based on some nice distributional theory and the hyperparameter estimates derived there are shown to perform better than those in Lonnstedt and Speed (24). Tai and Speed (34) further extended the univariate model in Lonnstedt and Speed (24) and Smyth (29) into multivariate settings and introduced the MB-statistic (multivariate empirical Bayes statistic) and the T2 statistic to rank genes in the order of differential expression in the one- and two-sample cases in the longitudinal time course context. In addition, Tai and Speed (36, 37) derived MB-statistics for D > two samples for longitudinal and cross-sectional data.

0 0

Post a comment