Screening methods for known and unknown SNPs

Most array-based approaches are aimed at the detection of known SNPs (22, 23). To broaden the power of SNP detection to the identification of unknown variants, high-density oligonucleotide arrays have been designed that cover all possible sequence permutations of the genomic region of interest and function as an array sequencing method (24-27). These arrays are composed of oligomer probes overlapping at one nucleotide interval (one nucleotide tiling principle) covering the genomic segment investigated. Each position includes four alternative nucleotides to cover all possible genomic permutations. Thus for a given genomic fragment, the number of oligos needed to identify any possible SNP is equal to the number of base pairs of the fragment times four. Although powerful, this approach is limited by the complexity of its design and the cost of the production and utilization of the chips. In particular, the cost of array preparation is disproportionate for genomic areas with low density of SNPs.

Single base chain extension (SBCE) utilizes the dideoxynucleotide sequence termination reaction principle to interrogate probes downstream of differentially labeled single nucleotide incorporation templates. Using a one nucleotide tailing design, it is possible to achieve results similar to those obtainable with array sequencing using four times less probe for a given genomic region.

A simplified screening tool that could discriminate conserved from polymorphic genomic regions or could identify rare individuals carrying unusual SNPs could dramatically increase the efficacy of allele-specific sequencing or targeted SNP identification.

Designing a simplified SNP screening array

The strategy described here utilizes the well-established allele-specific hybridization method in combination with a chromosome walking approach for the genome-wide screening of known and unknown SNPs. This approach requires four times less oligos than the SBCE method and eight times less oligos than the array sequencing method to investigate a given fragment of genomic DNA. Thus, this strategy should be considered a screening tool applicable to the investigation of unexplored areas of the human genome prior to extensive sequencing expeditions or the construction of high-density oligo arrays. Known SNPs can be identified while genomic regions containing unknown SNPs can be flagged and subsequently annotated by SBT. In this way, large chromosomal segments are screened and few regions where SNPs are present are identified. Therefore, the amount of sequencing required for their definitive characterization is drastically reduced. In addition, as new alleles are identified new allele-specific oligonucleotides can be incrementally added to the array for definitive typing purposes.

The SNP screening array includes two types of oligos: consensus oligos (covering the full length of the genomic fragment investigated) and allele-specific oligos (representing known variants from the empirically selected consensus sequence of the genomic area to be investigated). Both types of oligos are 18-nucleotides long. Consensuses oligos are defined and designed according to the sequence of an arbitrarily selected reference DNA and cover, at a four-nucleotide tiling interval, the full sequence of the genomic area investigated. The SNP position in a 20 nucleotide oligo (approximately the same length of the oligos used in this array) bears minor effects on the hybridization efficiency as compared to the kind of nucleotide change as long as the three outermost positions at either end of the oligo are avoided (28). Therefore, the four-nucleotide tiling design assures that any SNP within the interested region should be identifiable by at least one consensus oligo (Plate 3). Furthermore, the overlapping consensus oligonucleotide system provides references consisting of four or five consecutive consensus oligonucleotides with interrogating SNP dynamically positioned with which information derived from the allele-specific oligonucleotides can be compared, facilitating the interpretation and discrimination of allele-specific hybridization in hetero- or homozygous conditions. The fluctuating hybridization pattern of the consensus oligonucleotides surrounding an unknown SNP indicates its existence that could be subsequently confirmed by sequencing limited to the individual carrying the variant and the region of interest (29).

Allele-specific oligonucleotides of the same length as consensus oligo-nucleotides are selected based on available databases and the SNP interrogation site in the oligo is positioned at the centermost position. With this strategy, the SNP screening array can be used to spot novel while identifying known SNPs.

Detection system

The screening array employs two differentially labeled fluorochromes for proportional hybridization testing. The reference sample is arbitrarily selected and consistently used for all arrays. For instance, a cell line (possibly homozygous for the genomic site of interest) that can be continuously expanded represents a good reference sample. Complete sequence information about the region investigated could be obtained by sequencing the reference cell line and using the sequence to design the consensus oligo-nucleotides. For larger genomic segments or for arrays covering different chromosomal regions it is most likely that any selected reference would include heterozygous sites. This is acceptable as long as the information is documented and the reference kept constant. This will allow interpretation of the experimentally obtained data. The reference sample exemplified here consists of a cell line with homozygous SNP loci identical to the consensus sequence (a,a). Test and reference samples are amplified by PCR followed by in vitro transcription to generate single-stranded RNA. Array data are generated from cohybridization of fluorescence-labeled reference (i.e. Cy3, green) and test (i.e. Cy5, red) cDNA samples to consensus and allele-specific oligos, the latter representing known SNPs (variant oligos). Results are represented as log base 2 of the fluorescence intensity ratio

(Log2Ratio). In diploid organisms, four combinations can occur: (i) homozygosity at a certain locus of the test sample identical to the consensus (a,a) (consensus oligos Log2Ratio =0); (ii) homozygous SNP alleles that differ from the homozygous consensus (b,b) (allele-specific and the corresponding consensus oligo Log2Ratio >1); (iii) heterozygosity with one allele being identical and one allele being different from the consensus (a,b) (allele-specific oligo Log2Ratio >1 while the corresponding consensus oligo Log2Ratio <1); (iv) heterozygosity with both alleles different from the consensus (b,c) (consensus oligo Log2Ratio >1 and no hybridization to the allele-specific oligo). In regions containing unknown SNPs and, therefore, when no allele-specific oligos were designed to represent the unknown SNP, competitive hybridization occurs only in the consensus oligo between reference and test sample. Because of the perfect complementation of the reference sample to the consensus oligo, exclusive reference sample hybridization indicates the presence of a new SNP at that specific position (Plate 4).

Although this conceptually applies to the whole genome, in loci containing more than one polymorphic site, various combinations can simultaneously occur. This approach has been validated using the polymorphic HLA gene complex to exemplify the various combinations (29). Various permutations of homozygosity and heterozygosity have been illustrated and correspondent consensus hybridization that produces complex hybridization patterns highly specific for a particular phenotype could be observed. In these highly polymorphic conditions, each haplotype combination maintains a highly reproducible profile characterized by minimal variance. This allows the creation of "genotypic masks" within narrow ranges of variation to "fingerprint" known haplotype permutations for high-throughput typing of highly polymorphic genes. The power of this strategy in identifying unknown SNPs was analyzed using Relative Operating Characteristics (ROC) analysis which characterizes the performance of a binary classification model across all possible trade-offs between the false negative and false positive classification rates and allows the performance of multiple classification functions to be visualized and compared simultaneously (30). For each 18-mer probe, starting from third base and ending at 16th base, if the test target contained at least one single nucleotide different for the consensus sequence, it was defined as specific region SNP(+); otherwise it was SNP(-). When the test sample is most different from the consensus reference, as for b,b and b,c, higher accuracy with a sensitivity of 82% and a specificity of 96% was observed. The worst accuracy was noted when test and reference samples were closest as in the case of a,b heterozygosity (sensitivity 82% and specificity of 82%). The most informative analysis was, however, provided using data from all the possible combinations since in most common experimental conditions the relationship between test and consensus sample is not known and, therefore, all possible allelic combinations should be expected. In that case an optimal threshold of Log2Ratio < or equal to -0.62 yielded a sensitivity of 82% and a specificity of 89%. Thus, this strategy may identify four out of five unknown SNPs with 90% accuracy and the highest chance of discriminating false positive results when an a,b heterozygous sample is tested.

Genomic DNA amplification for array analysis

High quality and a sufficient quantity of genomic DNA are critical for high throughput approaches. In the case of clinical samples or biopsies where limited material is available, the amount of isolated DNA is in most cases far below the requirement for multiple genomic analyses. Therefore, high fidelity genomic DNA amplification becomes the first challenge for accurate genotyping. Depending on the genotyping method employed, DNA amplification can be allele-specific, gene-specific or whole genome-wide. Table 6.1 summarizes amplification methods according to their underlying principle and their advantages and disadvantages.

Here, only the T7-based gene-specific amplification method suitable for the SNP screening array analysis is presented. This strategy can be applied to the study of any locus using PCR in combination with in vitro transcription. Gene-specific primers flanking the gene of interest within a 1- to 10-kb range can be designed. Multiple primer pairs are needed when the targeted gene is larger than 10 kb or multiple genes are scanned simultaneously. To generate single strand targets, a T7 promoter sequence (5'aaa cga cgg cca gtg aat tgt aat acg act cac tat agg cgc 3') is attached to the 5' end of the forward primer for PCR amplification. In vitro transcription generates large quantities of linear single strand RNA for fluorescence labeling and hybridization (See Protocl 6.1).

6.5 Summary

The current SNP scanning array represents a potentially powerful and efficient strategy for high-throughput screening of genes for which little is known about their polymorphism. This strategy could also be used to identify mutations in disease genes or for typing known allelic variants of well-characterized genes such as HLA. This, however, would require specialized design of numerous oligos encompassing known variants and supportive software for efficient data interpretation. Various scenarios have been best exemplified by using exon 2 of the HLA-A locus as a model to identify an unknown allele as well as a known allele in a,a, b,b, a,b and b,c homo and heterozygous conditions (29). However, SNPs occur in the human genome on average every 600-2000 bases (1, 31). Therefore, most genes are characterized by a relatively narrow range of polymorphisms that would allow a relatively simple design of oligo-array chips and interpretation of results. Independently of the genomic region investigated, this strategy can identify unknown variants through observation of disproportionably depressed Log2Ratios of signals obtained at the position of consensus oligonucleotides. Thus, it may provide a great improvement in the ability to screen different genes for the frequency and location of polymorphic sites, which can be confirmed by site-directed sequencing limited to the region of interest. Thus, the best application of this strategy stems from the clinical need to rapidly segregate genes characterized by the presence or lack of polymorphisms in their coding or regulatory regions that may affect clinical phenotypes. A good example of such application is the screening of cytokines, chemokines and their receptors whose polymorphism(s) have been asso-

Table 6.1. Methods for genomic DNA amplification


SNP-specific amplification

Gene-specific amplification

Random genome amplification

Linear 17 genomic amplification

Linker-adapter PCR

cj> 29 Multiple displacement amplification (<|> 29 M DA)

Degenerate oligonucleotide-primed PCR (DOP-PCR)

Primer extension pre-amplification (PEP)

OmniPlex amplification

Amplification PCR

PCR plus 17 IVT

Restriction fragmentation, poly dT tailing and dA-T7 IVT

Restriction fragmentation, adaptor ligation and PCR

<|) 29 polymerase based random amplification

Random PCR

Random DNA fragmentation followed by OmniPlex library generation with universal flanked adaptor followed by PCR

Primer used

Allele-specific primers with SNP position at the 3' end

Gene-specific primer, 5' primer with attachment of 17

Random primer with extended PCR primer sequence

Oligo dA-T7 primer

Adaptor-specific PCR primer

Phosphorothioate-modified random primer


Random mixture of 15 base oligo primers

Universal primer



Advantage SNP-specific

Final products are single strand RNA or cDNA which reduces the complexity of competitive hybridization and enhances specific and efficacy of hybridization

Potential coverage of all SNPs

Potential coverage of all SNPs, less bias and large quality products

Potential coverage of all SNPs, validated and commercially available kits, 200-700 bp size.

Low error rate (3 x 106), 99.82 genome coverage

One primer, low cost, robust amplification with reduced genome complexity

Simple PCR reaction

One primer, 99.8% concordance in SNP genotyping and 90% representative of original genome

Disadvantage Reference

One SNP/reaction

Multiple primers needed (29) for broader coverage of the genome

Low efficiency of (35)

amplification and nonspecific artifacts

Relative short fragments (36) and incomplete coverage of loci

Missing SNPs in (37)

proximity of the restriction site

Amplified DNA >10 kb (38, 39) and further amplification and fragmentation are needed

Only 48% of the validated (40-42) SNPs could be genotyped and 22% of the predicted products are not amplified.

78% representative of (43, 44) genome and high error (1 x 10 3)

Lower than 10 ng input (45) DNA can reduce representation significantly ciated with individual predisposition to immune pathology, survival of transplanted organs and predisposition to cancer (22, 32-34).

10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook

Post a comment