Data analysis

The analysis of DNA microarray data is discussed in detail in other chapters and will be discussed only briefly here. Inherent to all microarray experiments are large, noisy datasets where the challenge is to process them in a fashion that produces the most reliable and meaningful results. The data acquisition and analysis processes are fairly well established: first the slides are scanned to measure fluorescence intensity, signals from the features are then scored and normalized, the differential expression is determined and finally statistical methods and clustering algorithms are employed to generate meaningful datasets. The scanning and quantification steps are generally speaking very straight forward, however the normalization of the raw data and the calling of 'hits' is an area with a lot of experimental variability.

Both differential expression and ChIP-chip experiments rely on the ratio of 'reference' signals to 'experimental' or 'test' signals; however it quickly becomes apparent to the experimentalist that several parameters can potentially skew the raw data resulting in erroneous face value ratios including: differences in dye intensity, regional heterogeneity on the slide, faulty washing leaving behind dust and salt residues, and differences in spot intensity. Inconsistencies introduced in the hybridization reaction and subsequent washing steps, as well as experimental noise could alter the final outcome of the experiment if they are not accounted for (13).

Seeking to assist microarray users with generating meaningful datasets in a consistent and automated fashion, various web-accessible data processing platforms have been developed. One example is the ExpressYourself program developed by bioinformaticists at Yale University. The aim of this program is to address each aspect of the data analysis process, including: background correction, intra- and inter-slide normalization, data quality and scoring. One aspect of this program that sets it apart from others is that it allows users to select from a variety of scoring algorithms, some of which have been especially tailored to handle datasets generated by ChIPchip experiments. Typically, researchers identify targets that have significant p values less than 10-4 or standard deviations from the median in multiple experiments (13).

0 0

Post a comment