## Statistical analysis A first visual inspection

Like in the previous section we start with something very simple. In this case a visual approach, which is part of the method SAM (Significance Analysis of Microarrays) by Tusher et al. (8). The MicrosoftÂ© Excel plug-in software is available at http://www-stat.stanford.edu/~tibs/SAM/. SAM computes scores for differential gene expression as given in the last section (observed scores). In addition, it simulates randomness by shuffling the patient labels and computes expected scores. Roughly speaking, these scores would occur if all genes in the experiment were non-induced. Plotting expected versus observed scores displays how much your data deviates from random noise. Figure 18.3 shows SAM-plots for three situations: data sets with low, medium and high contents of differentially expressed genes. The diagonal line denotes the perfect agreement between your data and random data. The more the line deviates from the diagonal, the more evidence for differential expression you have. Up-regulated genes result in points above the line and down-regulated genes result in points below the line. The amount of points deviating from the diagonal gives you a first hint of the level of differential gene expression.

Figure 18.3.

Figure 18.3.

SAM-plots for three simulated data sets with expected scores on the x-axes and observed scores on the y-axes. Diagonal lines denote equality between observed and expected scores. From left to right: Low, medium and high content of induced genes.

In R, use the command score <- twilight.pval( ... ) to compute observed and random scores. The values are stored in matrix score\$result as observed and expected. For convenience, use plot(score,"scores") to get a SAM-like plot.

0 0