# Understanding The Numbers Used In Population Data Publications

At the end of allele frequency tables in forensic population data publications is usually a list of values that can be used to evaluate the relative usefulness of each STR marker that has been typed. These measures of the evaluated DNA markers include the power of discrimination (PD), the power of exclusion (PE), the a priori chance of exclusion (CE), the polymorphism information content (PIC), and a marker's heterozygosity (H).

Unfortunately, authors publishing such data fail to describe what the various measures of DNA markers mean or how the values are related to one another either because of lack of space or lack of understanding. Typically the numbers for each of these statistical measures are generated by a computer program and therefore the user does not need to think about what the values indicate. Table 20.7 attempts to put a number of calculated genetic functions into context with one another.

Remember that the number of homozygotes (h) plus the number of heterozygotes (H) equals 100% of the samples tested. Thus, since h + H=1, then H = 1 — h and h = 1 — H.

Heterozygosity (H) is simply the proportion of heterozygous individuals in the population. It is calculated by dividing the number of samples containing heterozygous alleles into the total number of samples (Weir 1996, pp. 141-150). A higher heterozygosity means that more allele diversity exists and therefore there is less chance of a random sample matching. Edwards et al. (1992) described the following formula for calculating an unbiased estimate of the expected heterozygosity:

Table 20.7

Summary of formulas and calculations used to compute various parameters for population data analyses. The worked example information was generated from the D13S317 allele and contained in Table 20.2. The symbols below are according to the following key: Pi is frequency of ith allele in a population of n samples; xt is frequency of ith genotype; h = homozygosity; H = heterozygosity.

### Table 20.7

Summary of formulas and calculations used to compute various parameters for population data analyses. The worked example information was generated from the D13S317 allele and

 Genetic Function and Formula Worked example (D13S317 data from Table 20.2) Homozygosity (h) = E Pf i=i 0.2154 Heterozygosity (H) = 1 - Homozygosity = 1 - EPf 0.7845

Effective number of alleles (ne) =

Effective number of alleles (ne) =

Homozygosity

X P2

X Pi

4.6413

0.7556

0.8896

PE = 1 - 2X p2 - 21 X p2 + 2 X p4 + x p3 - 3 X p5 + 3X p2 X p3

0.5910

Probability of Identity (Matching Probability) = P = X x2

0.0771

D + I A /nn H+h (1-h)+h 1 1 Paternity Index (PI) =-= -—-— = — =-

2.3207

where n1, n2, ..., nk are the allele counts of Kalleles at a locus in a sample of n genes drawn from the population and p- is the allele frequency.

Gene diversity, often referred to as expected heterozygosity, is defined as the probability that two randomly chosen alleles from the population are different (Weir 1996, pp. 150-156).

Power ofdiscrimination (PD), or probability of discrimination, was first described by Fisher (1951). PD is equal to 1 minus the sum of the square of the genotype frequencies. PD is equal to 1 — Pt (see below).

Power of exclusion (PE) or probability of exclusion was first described by Fisher (1951) and may be determined by the formula: PE = H2(1 — (1 — H)H2), where H = heterozygosity.

The probability of identity (Pt) value is the probability that two individuals selected at random will have an identical genotype at the tested locus

(Sensabaugh 1982). It is calculated by summing the square of the genotype frequencies.

Polymorphism information content or power of information content (PIC) reflects the probability that a given offspring of a parent carrying a rare allele at a locus will allow deduction of the parental genotype at the locus and is determined by summing the mating frequencies multiplied by the probability that an offspring will be informative (Botstein et al. 1980).

Probability of a match (PM or pM) is sometimes referred to as the probability of a random match and is the inverse of the genotype frequency for a marker (or full profile).

Paternity index (PI) is the likelihood that the genetic alleles obtained by the child support the assumption that the tested man is the true biological father rather than an untested randomly selected unrelated man. The combined paternity index (CPI) is determined by multiplying the individual PIs for each locus tested.

Probability of paternity exclusion (PPE) is the probability, averaged over all possible mother-child pairs, that a random alleged father will be excluded from paternity (Chakraborty and Stivers 1996). 