A jJ,



■ Brain U74Av2 03/03: 103026J_at: 522351 □ Brain U74Av2 03/03:104171_f_at: 574471

mismatches for the sequence of other strains. In rare cases, the mismatch for one strain may be a match for another strain, reversing the polarity of a probe level QTL effect. Probe design is a complex process that has multiple constraints. Identification of SNPs in the probes will facilitate the development of probe level filters to enhance the estimation of transcript abundance. In our comparison of probe sequences with SNPs reported in the Celera Genomics database (Celera Discovery System, Celera Genomics, Rockville, Maryland [7/01/03]), we verified fewer than 1500 probes out of about 390,000 containing SNPs on the U74Av2 array.

C. Probe Redundancy and Overlap

Another factor complicating the use of individual oligonucleotide probes on the Asymetrix platform is that a small subset are redundant and recognize multiple gene products. Of 196,670 perfect match probes sequences on the murine U74Av2 array, 762 probe sequences are duplicated at least once; some as many as eight times. In addition, 1659 probes are completely redundant with at least one other probe. In the case of the gamma crystallins D, E, and F, there are six probes that are common to the three probe sets, 12 probes that occur in two of the three probe sets, and 12 probes (4 each) that are unique to each of the three transcripts. The amount of variability observed for each replicate probe on the array is proportional to the intensity of probe hybridization. For the most highly expressed genes, a high degree of consistency to the QTL maps is observed, whereas for low-affinity probes, QTL maps for identical probes can vary somewhat (Fig. 10). For some related gene products, virtually the entire probe set overlaps, for example, vomeronasal receptors 8 and 9, which share 15 or 16 probes, rendering discrimination of the two transcript expression levels virtually impossible. The presence of multiple identical probes can help reduce noisy estimates and improve precision of array regional normalization methods. In other cases, the probes overlap extensively, so as many as 24 of 25 nucleotides overlap with the adjacent probe (Fig. 11). Naturally, these probes often have highly correlated expression levels. If an SNP is contained in a subset of the probes, or other physical property, or if exon variability is manifest in the probe hybridization, the overlap can lead to a biased assay of expression levels.

Fig. 10. Interval mapping for probes that are shared among several probe sets. These can be thought of as analogous to spot replicates on a complementary DNA array. For abundant (strongly hybridizing) probes consistency is high. As abundance/hybridization decreases, variance is higher and probe level mapping becomes more inconsistent. (See Color Insert.)

D. Implications of Probe Level Variation and Normalization Techniques for QTL Analysis

Various normalization procedures have been devised to account for differential probe level hybridization effects. This is an actively developing area of analysis, and different analytical methods may be required for different purposes. Some of these algorithms adjust for array-related technical variance simultaneously, whereas others do so in a separate stage of analysis. The various algorithms exhibit different performance for different RNA abundances. The Asymetrix MAS 5.0 algorithm uses a Tukey bi-weight average of the differences between perfect match (PM) and those mismatch (MM) probes that do not exceed the perfect match (Asymetrix, 2001). This approach is problematic, particularly in the event that genetic variation results in differences in probe level hybridization and in the event that the PM and MM probes have very similar hybridization to probes. Robust multichip average (RMA) uses a background correction, quantile normalization, and linear model fit using robust methods, taking advantage strictly of the PM data (Irizarry et al., 2003). Li and Wong's (2001) dChip uses an invariant gene set to compute a model-based expression index. Positional-dependent nearest neighbor (PDNN) takes into account the stacking energy of probe level hybridization reactions (Zhang et al., 2003). Each of these algorithms can produce very different gene-specific QTL maps. An unpublished comparison of the overall mapping results for each of the methods by Dr. Kenneth Manly and colleagues reveals that there are more than 600 transcripts for which the peak of linkage is the same for all methods, although these peaks are of varying statistical significance. Additionally, for the significant QTLs by any method, the peak location of linkage does not vary by normalization method for a subset of approximately 100 loci (Fig. 12). This indicates that although these normalization methods can be quite consistent with one another, some may be conservatively biased. This is because normalization must be performed at each locus in the mapping analysis. Adjusting data before analysis by a cofactor will in most cases reduce the magnitude of the estimate of the cofactor effect (Darlington and Smulders, 2001). Normalizing the arrays before mapping by either using strain as a grouping variable, or not using any grouping variable in the mapping analysis can either increase the noise or reduce the signal, respectively, in expression estimates by genotype. Both of these approaches introduce conservative bias, evidenced in the p value distributions for mapping, which demonstrate greater than expected numbers of high p values. This phenomenon is most extreme in RMA, followed by PDNN and MAS 5.0. The use of simultaneous normalization and mapping is computationally intensive (an entire microarray group comparison analysis must be performed at each locus) but is not impossible.

Tlie table below lists information of all probes of probe set 100573_f_at from database UTHSC Brain mRNA U74Av2 (March03)




Tm -C

Stacking Energy KHT*

g sb nsb


0 0

Post a comment