## Log2 BPMj ai mj e j1

for i = 1, ••• , I and j = 1, ••• , J. The quantity B(PMj is the background-adjusted, normalized PM intensity, ai is a probe affinity effect, mj is a quantity

Fig. 7. A plot of background-adjusted and normalized log PM intensities against concentration for a spike-in probe_set suggests an additive model. (See Color Insert.)

Fig. 7. A plot of background-adjusted and normalized log PM intensities against concentration for a spike-in probe_set suggests an additive model. (See Color Insert.)

proportional to the amount of transcript on array j, and £ j is an independent identically distributed error term with mean 0. For identifiability of the parameters, we assume that the sum of the a¿ is 0 for each gene. Notice that this assumption translates to assuming that the Asymetrix technology has probes with expected intensities that, on average, are representative of the associated gene expression.

Under this model an unbiased estimate of mj for each array j could be obtained using the average of the log2(B(PMj)) across the i = 1, ••• , I probes. This average can be used to estimate a simple expression measure. We can demonstrate empirically that this expression measure works well. If the errors are normally distributed, this estimate is according to various statistical criteria. However, many researchers (Li and Wong, 2001b) have observed that outliers (observations too extreme to occur under the normality assumption) are relatively common. For some arrays, the proportion of outliers is as high as 15%. This suggests that the aforementioned model should be fit using robust procedures.

Median polish is a simple ad hoc procedure for fitting such a model robustly (Holder et al., 2002; Tukey, 1977). Irizarry et al. (2003a) demonstrate that the expression measure obtained using median polish provides estimates with comparable accuracy to and much better precision than the two leading expression measures, namely those obtained from MAS 5.0 and from dChip MBEI (Li and Wong, 2001a,b). Irizarry et al. (2003a,b) call this procedure the Robust Multi-Array Analysis (RMA).

The additive model lends itself to various practical extensions. For example, if we are comparing two populations of RNA species for which we have many technical replicates that we assume have the same expected RNA expression, we can write log2 {B{PMjji)) = a, + mj + ejk for i = 1, ••• , I, j = 1, ••• ,J, and k = 1, ••• , K. The estimate of mj would then be based on K times more data than RMA. If we had technical replicates instead of biological replicates, we could add a Zj term to the model, representing a random effect (Chu et al., 2002).

## Post a comment