Normalization methods

Microarray data is frequently displayed in logarithmic scale. From the graphical representation of microarray intensities and ratio-display it is immediately clear that the log-display shows a more convenient image. When plotting a histogram of raw microarray intensities one normally gets a shape similar to a geometric distribution. Plotting the logarithm of the data points results in a shape which is similar to a normal distribution. Another example is the display of ratios when comparing two RNA samples. Plotting the intensities of each sample against each other results in a display where most data points are clustered in the lower left hand corner of the plot. Instead, the graphical display of the log-product (log(sample: x sample2)) plotted against the log-ratio (log(sample:/sample2)) is more informative (see Figure 17.1) and the log-ratio can be interpreted as a measure for differential gene expression.

Thus, using logarithmic scale evens out skewed distributions of the data and gives a more realistic picture for outliers when displaying the log-product versus the log-ratio of two samples. Furthermore, by applying the logarithm of the intensities multiplicative effects become additive.

As already mentioned, in logarithmic scale the intensities are rather equally distributed across their dynamic range while this is not the case for the untransformed display of the data. In fact, this is a big advantage for the visualization but not necessarily for the analysis. Problems arise with the many low intensity values or negative values which are frequently evident after performing background subtraction. Thus, for low intensities we get a very strongly scattered plot and for zero or negative values we can even get non-defined data points.

Typically, log-ratios are normally distributed and at least for high intensities the variance is independent of the intensities, which is another advantage of log transformation for data analysis. However, this does not hold for low intensities. There, the variance is dependent on the intensity and variance of log-ratios and decreases with increasing log-product values. Thus, when visualizing log-ratios, one has to take into account that the variance is not constant along the whole dynamic range. To overcome this problem, variance-stabilizing normalization has been introduced (8, 9), which is reviewed at the end of this section.

0 0

Post a comment