According to the steeply increasing number of reports in the literature Affymetrix DNA oligonucleotide arrays (GeneChips) have gained considerable acceptance in the research community. The latest version of human GeneChips, U133A Plus 2.0, representing approximately 38 500 transcripts on a single chip allows genome-wide expression profiling in a very convenient setting. In contrast to cDNA and oligonucleotide arrays from different manufacturers, Affymetrix GeneChips measure transcripts by a set of multiple probes (called a probe set), which usually consists of 11 probe pairs. Each probe pair contains two 25mer oligonucleotide probes, a perfect match (PM) oligonucleotide that represents part of the cDNA sequence of interest and a mismatch (MM) oligonucleotide, which is identical to the PM oligo except for a mismatch mutation at the central position. In the Affymetrix image analysis method implemented in the MAS5.0 and GCOS software, the PM-MM signal differences of the 11 probe pairs of a probe set are converted into a single probe set signal value, which is a measure of the abundance of the transcript. In addition, for each probe set the software estimates the reliability of the measurement resulting in a detection call (present, absent, or marginal) based on a significance analysis of the PM-MM differences.

To minimize discrepancies due to varying sample preparations, hybridization conditions, staining intensities or probe array lots, the software provides several normalization options, which can be applied to the datasets from different arrays. The recommended procedure for datasets with relatively little expression differences is called 'global scaling'. During global scaling, the software examines all probes on the array to compute a trimmed mean signal. Then, a scaling factor is calculated and applied to each signal on the array to standardize the trimmed mean of the array to a user-specified target signal. Another option, called 'selected probe sets scaling', computes the trimmed mean signal of selected probe sets to derive a scaling factor that is again applied to all probes and adjusts the trimmed mean signal of the selected probe sets to the target signal value specified by the user for all arrays of a given dataset. Selected probe set scaling is more appropriate and provides more accurate signal measurements, if differences between samples are relatively high. However, it requires a set of transcripts known to be equally abundant in all samples, such as a fixed amount of external controls spiked into a constant amount of starting material.

Most probe sets cover a target sequence of about 300 bases in length located within the region of 600 bases proximal to the 3'-end of the transcripts, mostly in non-translated regions, where distinction of transcripts encoded by highly homologous gene families is facilitated. To select these probe set sequences, information from multiple public domain databases as well as proprietary information of Affymetrix is used. In cases where database entries suggest the occurrence of alternative splicing or polyadenylation, multiple probe sets covering different regions of a gene can be present on the GeneChip arrays. Information on each probe set, including the sequences interrogated by the probe pairs and the sequences of the oligonucleotides chosen, has been made accessible to the public via an internet platform called NetAffx Analysis Center (

To be analyzed on GeneChips, mRNA molecules contained in total RNA samples are amplified and labeled by a standardized and widely used protocol (Figure 10.1 A), which involves conversion of mRNA molecules into double-stranded cDNA using an oligo(dT)21 primer with a T7 RNA poly-merase promoter tag for first strand synthesis. After second strand cDNA synthesis, subsequent in vitro transcription by T7 RNA polymerase yields biotinylated anti-sense copy RNA, termed cRNA target, which is sufficient for the hybridization of several GeneChip arrays. Starting with a perfectly intact RNA sample, the resulting cRNA target should ideally represent full length anti-sense mRNA sequences. However, degradation of the RNA during preparation or storage can lead to truncated or cleaved mRNA molecules that cause premature stops of oligo(dT)-primed reverse transcription. Hence, mRNA sequences located 5' of cleavage sites are not converted into cDNA and a 3'-biased cRNA target is obtained (Figure 10.1B). To be able to estimate whether such a 3'-bias has occurred during cRNA preparation and to monitor the quality of the RNA used to prepare a cRNA target, Affymetrix expression GeneChips provide a number of probe sets, which interrogate 5', middle, and 3' parts of two transcripts of the housekeeping genes, Gapdh and p-Actin, which are ubiquitously expressed at relatively high levels. Other 5', middle, and 3' probe sets detect different parts of various externally added intact polyA+-spikes, which can be used as external normalization controls and serve to monitor cDNA- and cRNA-synthesis steps. cRNA targets derived from high quality RNA samples display 3' to 5' probe-set signal ratios for Gapdh and p-Actin close to 1.0, whereas cRNA targets from degraded starting material exhibit increased ratios. With a few exceptions, where increased 3'/5'-ratios of housekeeping genes may truly reflect a regulated cellular process such as apoptosis, increased 3'/5'-ratios result mostly from incomplete inactivation and removal of endogenous ribonucleases during cell homogenization and RNA extraction, contamination with exogenous ribonucleases or spontaneous cleavage, which is frequently observed in purified RNA samples that have been subjected to multiple freeze and thaw cycles.

1st strand synthesis

2nd strand synthesis

In vitro transcription (IVT Enzo Kit)

RNase H E.coli DNA Ligase DNA Polymerase I T4 DNA Polymerase I


T7 RNA Polymerase Biotin-NTPs biotinylated full-length 3'-_ antisense RNA (cRNA)

500 1000 1500 Distance (bp)


ds cDNA


Biotinylated antisense cRNA copies cRNA target from intact mRNA

-(U) n ■(U) n -(U) n ■(U) n cRNA target from partially degraded RNA

Probe set target sequences Relative signal intensity:

Probe set target sequences Relative signal intensity:

'Distant' probe set:

More 3' probe set:

Signal ratio (degraded/control)

'Distant' probe set:

More 3' probe set:

Figure 10.1.

Differential representation of 5' mRNA sequences in cRNA targets from partially degraded samples. (A) Intact mRNAs are converted by the standard labeling method into biotinylated full-length anti-sense cRNA. (B) Schematic representation of anti-sense cRNA copies of a specific gene generated from an intact (left panel) and a partially degraded RNA sample (right panel). Depending on the position of the cleavage sites in the mRNA molecules, copies lacking variable parts of the 5' region are generated (marked 2 to 4). A probe set interrogating more 3' located sequences would detect copies 1 to 3, whereas a more 5' probe set could detect only 1 to 2, resulting in a variable degree of under-estimation and increased signal ratios when compared to intact samples. cRNA copy 4 is non-productive with respect to both the 5' and 3' probe sets, while 3 is non-productive only with respect to the 5' probe set. (C) Graph showing the relationship of the measured signals in intact and degraded total RNA and the distance between the 5'-end of the probe-set sequences and the polyA end of the transcript. The ratio of signals observed in degraded and control rNa for the AFFX-Gapdh and ff-Actin 5', middle, and 3' probe sets is plotted against the distance between the 5'-end of the probe set and the start of the polyA tract. Accession numbers for human Gapdh and fi-Actin full-length sequences are M33197 and X00351, respectively.

Most GeneChip users agree that array data obtained from RNA samples showing 3'/5'-ratios for p-Actin greater than 3.0 should be treated with special caution and must not be compared with array data from intact control samples, since the differential representation of 5' mRNA sequences might introduce a significant error. While most microarray lab units perform mandatory RNA quality checks by analyzing RNA samples on the Agilent BioAnalyzer or by gel electrophoresis to exclude very poor RNA preparations from further processing, a considerable variation of 3'/5'-ratios is still observed in many projects and experiments. Moreover, because some tissues or cells contain high amounts of ribonucleases or require more time for RNA extraction than others, systematic differences in RNA qualities of samples to be compared can sometimes hardly be avoided.

0 0

Post a comment