Adequate Sample Sizes In Dna Databases Used For Allele Frequency Estimation

In an ideal world, DNA databases would include STR genotypes from every individual in a particular population to permit extremely accurate evaluations of DNA profile frequencies. However, practical considerations of cost and time necessitate a smaller database. Fortunately, it is possible to run a small subset of the population and reliably predict allele and genotype frequencies in the entire population - much like a telephone survey of several hundred individuals is used to try and predict the outcome of a political election. The key is collecting information from enough individuals to reliably estimate the frequency of the major alleles for a genetic locus.

Most published population data includes on the order of 100-200 STR types per locus per population examined (see Table 20.1). In a key paper in 1992 entitled

'sample size requirements for addressing the population genetic issues of forensic use of DNA typing', Ranajit Chakraborty concluded that 100-150 individuals per population could provide an adequate sampling for a genetic locus provided that allele frequencies below 1% were not used in forensic calculations (Chakraborty 1992). The concept of minimum allele frequencies will be discussed below. Evett and Gill (1991) arrived at a similar conclusion with multi-locus matches of DNA profiles, namely that 100-120 individuals per locus per population were sufficient for robust likelihood calculations. Collecting information from more samples usually only improves the precision of a result rather than the accuracy of the allele count (see Foreman and Evett 2001).

