## Analysis of Real Time QPCR and Real Time QPCR Arrays

3.8.1. Theoretical Considerations

Prior to reaching saturation (owing to exhaustion of primers and nucleotides, loss of polymerase activity, and so on), PCR amplification proceeds exponentially and can be described by Ni = N0 x (1 + k)i, where N0 represents the number of molecules in the original sample and Ni the number of mRNA molecules at cycle i (i = 1-40). During the exponential phase, the amplification efficiency k (0 < k < 1) of a given primer pair is constant. Before real-time PCR, it was not easy to identify the exponential phase of the reaction. Either the same reaction was run for different cycle numbers (20, 22, 24, and so on) and the product quantified by gel electrophoresis using the same amount of sample in each case, or different dilutions of sample were used in multiple PCR reactions for the same cycle number. During real-time QPCR the amount of product at each cycle is quantified (3). Fluorescence intensity, Rn, has a logarithmic dependence on fluorophor (the PCR product) concentration, yielding Rn = log(Nj) = log[N0 x (1 + k)i]. Real-time quantitative PCR compares two samples with target concentrations Na and Nb by recording the cycle numbers (CT) for a and b at which the amplification product yields enough fluorescence to cross an operator-determined threshold T (set at five times the SD of the nontemplate control [NTC]). Consequently, Rna = Rnb and log[Na x (1 + k)a] = log[NbX(1 + k)b] or log(Na) - log(Nb) = log(1+k)b - log(1+k)a = loga+k)b - a (for i = 0, Ni=0 = N0X(1 + k)0, i.e., Ni=0 = N0). Ideally, k = 1 and (1 + k) = 2, i.e., at each cycle two reactions products are produced per target molecule. This leads to Ni=N0x(1 + 1)i = N0x2i. Assuming log = log2, Na/Nb = 2b-a, where Na/Nb represents the fold difference in mRNA levels of two samples with CT = a and CT = b.

Hence, it is possible to extract the relative ratio of abundance in two samples based on this calculation. Interestingly, hybridization-based DNA arrays have similar characteristics, since the color intensity ratio in a fluorescent Cy3/Cy5 DNA array exhibits a logarithmic dependence on the amount of hybridized probe (13). Analogous to the amplification efficiency k for PCR, a hybridization-efficiency K0 applies to DNA arrays, which is a function of the length and base composition of the particular cDNA fragment at a given hybridization temperature.

3.8.2. Absolute Quantification for One Primer Pair on Multiple Samples

To quantify the abundance of a single mRNA and/or viral species in diagnostic applications, a standard curve is generated that plots the CT number in relation to the copy number per unit, for instance, copy number per 106 cells or per 1 ^g DNA (14). Actual values are interpolated by linear regression analysis (Fig. 6). The slope of the dilution curve defines the amplification efficiency k. A decision is made based on the interpolated copy numbers, and the signifi-

Fig. 6. Linear regression of a real-time QPCR primer pair for eastern equine encephalitis virus (EEE). Plotted on the X-axis is the log of the copy number against the CT achieved for each dilution on the T-axis. The slope of the line is listed in the form of y = mx + b, and R2 = the regression coefficient. Each dilution was amplified in triplicate.

Fig. 6. Linear regression of a real-time QPCR primer pair for eastern equine encephalitis virus (EEE). Plotted on the X-axis is the log of the copy number against the CT achieved for each dilution on the T-axis. The slope of the line is listed in the form of y = mx + b, and R2 = the regression coefficient. Each dilution was amplified in triplicate.

cance of the observation is established by multiple measurements per sample. Mean, standard deviation (SD), and/or confidence intervals (CVs) can be calculated from the interpolated copy number per sample, and goodness of fit of the standard curve can be gauged by its regression coefficient R2 (reviewed in ref. 15). Calculations are performed using Excel or more advanced statistical software such as SPSS (SPSS Science, Chicago, IL). A standard curve will also reveal the linear range of real-time QPCR for a particular primer pair, such as in viral load assays. This type of validation is ideally suited for the quantification of multiple samples with a single primer pair. However, it is very cumbersome, and in order to maintain perfect accuracy, a standard curve has to be included with each amplification group (96-well plate) and for each primer pair (see Note 10).

3.8.3. Absolute Quantification of Two or More Different Primer Pairs

Standard curves are generated for each primer pairs as in Subheading 3.8.2., and actual copy numbers are interpolated for each target. Copy numbers can then be compared for each target over multiple samples using conventional statistics as outlined in Subheading 3.8.2. Furthermore, copy numbers for the two (or more) different targets can be compared with each other. For example, the relative mRNA levels for two different mRNAs in the same tissue can be recorded. Calculations are performed using Excel or more advanced statistical software such as SPSS.

### 3.8.4. Relative Quantification for One Primer Pair

Often transcriptional profiling is concerned only with relative differences between two samples, a and b, which are expressed in unit less fold change. Hence raw CT numbers can be used directly. Relative quantification eliminates the intermediate use of a standard curve and allows for the direct comparison of the fold differences between two target populations. This only requires the data for each sample, not a standard curve (16-19). By applying rank-based statistics (Wilcoxon's sum rank test) or a simple i-test, we can determine, for instance, whether one of the tissues or treatment yields to a relative (and statistically significant) change in mRNA levels between different samples.

3.8.5. Relative Quantification for Multiple Primer Pairs

The unmanipulated CT data for multiple primer pairs and multiple samples can also be used to extrapolate the relative expression pattern for many genes. To do so, we need to apply hierarchical clustering, as previously described (20). Importantly, the same clustering algorithms that are in use for hybridization array analysis can be used to analyze real-time QPCR arrays (Fig. 7). Instead of feeding in the individual spot intensities as recorded in hybridization arrays (21) as a gene by experiment table into the program, individual CT values in the format of a PCR primer by experiment table are used as input.

In order for relative quantification to be valid between different primer pairs, three constraints are placed on the amplification efficiency for each primer pair k:

1. The amplification efficiency k or E = (1 + k) must not change with increasing cycle number. This assumption is valid only during exponential amplification, when primers, polymerase activity, or nucleotides do not limit the reaction. Setting the threshold appropriately guarantees that this key assumption is not violated.

2. The amplification efficiency k is constant over a wide range of concentrations (typical five orders of magnitude for real-time QPCR, which determines the linear range of the assay) but may not be accurate for comparing very low or high levels m r- ■<<

Fig. 7. Representation of hierarchical cluster analysis. Shades of gray indicate transcription level with lighter shades representing increased transcription. Groups of genes clustering together are shown by the thick black lines next to the clustogram, and each group is labeled with a letter. A total of four clusters are found in this figure (A, B, C, and D).

of target DNA. In contrast to conventional end-point or gel-based PCR methods, real-time QPCR instruments and fluorescent chemistry record the entire amount of product at each cycle and thus allow for the direct visual observation of constraints (1) and (2) for each data point.

3. The amplification efficiency k determines the spread, i.e., into how many fold target-level-difference a given CT difference translates. Under ideal amplification conditions, exactly two molecules are produced per parent at each cycle. This assumption leads to the widely used shortcut to convert CT differences into fold differences: fold difference (a - b)= 2 exp(CTa-CTb).

Is this a reasonable supposition? Figure 8A visualizes the effect of changes in k by plotting relative fold difference for various amplification efficiencies E = k + 1. Assuming ideal amplification (k = 1, i.e., E = 2), a CT difference of five cycles between two samples CTi and CTj translates into a 32-fold difference in input levels. However, if the amplification reaction proceed with 20% less efficiency than ideal (k = 0.8, i.e., E = 1.8), CTi - CTj = 5 represents only a 19-fold difference. If the PCR efficiency drops below k = 0.6, even a 10-cycle difference in CT does not yield an appreciable fold difference. Since most PCR reactions do not proceed under ideal conditions, assuming k = 1 (E = k + 1 = 2) almost always overestimates the true difference in target levels. This explains some of the outrageous discrepancies in fold induction/suppression, observed when DNA hybridization array data were verified by realtime QPCR.

Multiple-primer real-time QPCR arrays compound this problem, since the aim is to compare many different primer pairs with each other. This is strictly possible only under than additional constraint, that the amplification efficiencies Ea and Eb for any two primer pairs a and b in the array do not differ from each other. It makes a comparison between different primer pairs (measuring the transcription profile of different mRNAs) impossible, without first determining the standard curves for each primer pair j, j = 1 . . . m, and then comparing fold differences obtained after absolute quantification. Surprisingly, however, most primer pairs have very similar amplification efficiencies. We typically calculate the amplification efficiency by dilution once for each primer pair and exclude primers that fail to amplify with E < 1.8 (Fig. 8B). According to the considerations in Fig. 8A, the maximal error introduced by different amplification efficiencies in this case is twofold, or one CT unit. This is less than the experimental error in most cases. More elaborate schemes have been and are still being developed to compare multiple primers (16,17). For transcription profiling, however, in which only the change between any two samples is recorded (e.g., increase over time), relative clustering of the CT values will easily discern different response classes (see Note 11).

Finally, it is important to realize that by and large the error for each primer pair is dependent on the experimental error and handling error only, but not on the amplification efficiency or the amount of input sample (Fig. 8C).

Fig. 8. Amplification efficiency of real-time QPCR. (A) Visualization of the effect of changes in amplification efficiency when comparing fold differences between primers. Theoretical normalized CT values are plotted on the x-axis against relative fold difference of the yy-axis. Ideal amplification is represented by the solid black line; over- and underestimates are shown with dotted black lines. (B) Amplification efficiencies for 91 primers. The gray histogram shows the number of primers for a given amplification efficiency. Overlaid is the culmulative probability. (C) Error is independent of amplification efficiency. This is shown by plotting amplification efficiency (x-axis) against the standard deviation (y-axis) for each primer in the KSHV 96-primer array.

Fig. 8. Amplification efficiency of real-time QPCR. (A) Visualization of the effect of changes in amplification efficiency when comparing fold differences between primers. Theoretical normalized CT values are plotted on the x-axis against relative fold difference of the yy-axis. Ideal amplification is represented by the solid black line; over- and underestimates are shown with dotted black lines. (B) Amplification efficiencies for 91 primers. The gray histogram shows the number of primers for a given amplification efficiency. Overlaid is the culmulative probability. (C) Error is independent of amplification efficiency. This is shown by plotting amplification efficiency (x-axis) against the standard deviation (y-axis) for each primer in the KSHV 96-primer array.

### 3.8.6. Normalization

For real-time QPCR, two types of normalization can be applied: type I normalization relative to a reference sample t0 or median for each gene yielded dCT, and type II normalization relative to the reference gene, e.g., GAPDH, yielded DCT. The latter eliminates differences caused by variation in the overall input cDNA concentration. Using experimental samples (e.g., response to a particular drug in culture), one should set up the experiment and normalize the

Table 1

Normalization Possibilities for Two Genes (A and B)

Time

Time

Table 1

Normalization Possibilities for Two Genes (A and B)

 Gene
0 0