Database Analyses

As population databases increase in numbers, virtually all populations will show some statistically significant departures from random mating proportions. Although statistically significant, many of the differences will be small enough to be practically unimportant.

Population DNA databases are important for comparison purposes to understand how frequent or how rare a crime scene DNA profile may be in a particular population. Hundreds of publications in the literature contain information on DNA profiles generated with genotype information from common short tandem repeat (STR) loci across tens of thousands of individuals collected from various populations around the world (see reference listing available on the NIST STRBase web site: http://www.cstl.nist.gov/biotech/strbase). Table 20.1 summarizes several sets of population data that have been published in the literature.

Population comparison DNA databases are often generated by individual forensic laboratories to assess variation in common local populations. This is particularly important to locales that may have an isolated and inbred population within its jurisdiction (see discussion on population substructure later in this chapter). For example, in Arizona it would be helpful to have a population database involving Native Americans such as Apaches and Navajos since they live in fairly close-knit communities within Arizona and would be expected to have different genotype frequencies compared to Caucasians or African-Americans living in Arizona (see Chakraborty et al. 1999).

The primary goal of generating a population database is to find all 'common' alleles and sample these alleles multiple times in order to reliably estimate the alleles present in the population under consideration. A listing of observed alle-les for the commonly used STR markers that have been reported in numerous population studies from around the world is contained in Appendix I. However, it is worth noting that some of these alleles, particularly the microvariant alle-les (see Chapter 6) have only been observed a few times and are therefore rather rare (e.g., TH01 allele 6.3). Allele frequencies for common alleles at 15 different STR loci in the three major U.S. population groups may be found in Appendix II (Butler et al. 2003).

CHAPTER 20

Source

Population

Samples Typed (N)

Parameters

Evaluated

Reference

Arab

Yemenites

100

Profiler Plus kit (9 STRs)

MP, PD, PIC, PE, PI, H, h

Klintschar et al. (2001)

Australian

Caucasian

2645

Profiler Plus kit (9 STRs)

H, MP, PE

Bagdonavicius et al. (2002)

Austria

Caucasian

204

SGM Plus kit (10 STRs)

h, PD, PE

Steinlechner et al. (2001)

Guam

Filipinos

99

Profiler Plus kit (9 STRs)

h, PD, PE

Budowle et al. (2000)

Israel

Jewish

124

Profiler Plus kit (9 STRs), DYS19, DYS389I/II, DYS390, DYS391, DYS393, DYS287, D4S243, F13A1, D18S535, D12S391

H, PD, CE, CE2

Picornell et al. (2002)

United States

Caucasians

302

Identifiler kit (15 STRs)

H

Butler et al. (2003)

Tabe 20.1

Summary of a few published population data sets. Forensic parameters evaluated include heterozygosity (H), homozygosity (h), power of discrimination (PD), power of exclusion (PE), chance of exclusion (CE), chance of exclusion if only one parent and child are typed (CE2), polymorphic information content (PIC), matching probability (MP), paternity index (PI).

This chapter will cover the primary aspects of generating and testing population DNA databases prior to their use in estimating STR profile frequencies. Calculations for determining the rarity of a particular STR profile will be discussed in Chapter 21 and will be based on allele frequencies found in Appendix II.

GENERATING A POPULATION DNA DATABASE

The primary steps in generating a population database, such as found in Appendix II, are illustrated in Figure 20.1. A laboratory must first decide on the number of samples that will be tested and what particular ethnic/racial groups are relevant to estimating DNA profile frequencies that might be encountered by the laboratory. Population databases are often generated by gathering a set of biological samples in the form of liquid blood from a local hospital or blood bank. Usually the individuals selected are healthy and hopefully unrelated to one another so that they reliably represent the population of interest. These 'convenience' samples are deemed reliable since they are similar with other data sets (NRC II, p. 58) (see Table 20.4). Usually the individual samples are devoid of identifiers that could be used to link the DNA typing results back to the donor (see below).

After the samples have been gathered, they are extracted, polymerase chain reaction (PCR) amplified, and genotyped at the STR loci of interest, such as the 13 core loci used in the FBI's Combined DNA Index System (CODIS). These single-source samples are typically processed in the same manner as the convicted felon samples mentioned in Chapter 18 using commercial STR kits (see Chapter 5) and standard interpretation guidelines to designate alleles (see Chapter 15).

Figure 20.1

Steps in generating and validating a population database that can then be used to estimate the frequency of an observed DNA profile in the population.

Following the gathering of the genotype data, the information is converted into allele frequencies by counting the number of times each allele is observed. Table 20.2 shows an example of allele counting for the locus D13S317 used to determine the Caucasian data in Appendix II. The observed alleles, ranging from 8-15 repeats, are listed across the top and down the left side. At the intersection of the rows and columns, the numbers of observed genotypes are listed. For example, starting in the top left hand corner, the genotype 8,8 is seen nine times in the set of 302 individuals examined while the genotype 11,14 is seen 12 times. On the right side of Table 20.2, the numbers of observed alleles are counted by summing the row and column containing the allele of interest. Thus, the number of chromosomes containing allele 8 is equal to 68 from 9 + 9 + 1 + 17 + 13 + 10 + 0 + 0 for the row containing allele 8 plus nine for the column with the 8,8 genotype. The number of 8,8 genotypes is counted twice since both chromosomes contain an allele 8 at the D13S317 marker. The frequency for allele 8 is determined by dividing 68 into the total number of chromosomes, which are 604 since there are two chromosomes for each of the 302 individuals typed. Note that there is only one allele 15 observed in this study, which comes from a 10,15 genotype. A little later we will discuss the concept of minimum allele frequencies in order to reliably estimate population allele frequencies.

Allele frequency information allows for more compact data storage and enables independence testing, such as exact tests for Hardy-Weinberg

Table 20.2

Genotype array and allele count for the STR locus D13S317 from unrelated U.S. Caucasian samples (n = 302 or 604 chromosomes measured). Only the observed allele frequency is reported in Appendix II. Note that allele 15 is only observed once and therefore is below the minimum recommended allele count of five. Thus, the value of 0.00828 (5/2N) should be used instead of 0.00166for accurate estimation of genotype frequencies according to the recommendations of the National Research Council report (1996).

Genotype Array

8

9

10

11

12

13

14

15

Allele Count

Observed Frequency

8

8,8 9

9

8,10 1

8,11 17

8,12 13

8,13 10

8,14 0

8,15 0

8

68

0.11258

9

9,9 1

3

9,15 0

9

45

0.07450

10

10,10 2

10,11 12

3

10,14 2

10,15 1

10

31

0.05132

11

11,11 37

11,12 54

11,13 21

11,14 12

11,15 0

11

205

0.33940

12

12,12 21

7

12,15 0

12

150

0.24834

13

5

13,15 0

13

75

0.12417

14

14,14 0

14,15 0

14

29

0.04801

15

15,15 0

15

1

0.00166

equilibrium (see below). Typically the sample genotypes and allele frequencies associated with a particular ethnic/racial group are segregated to enable both intra- and inter-group comparisons. Several important issues will be considered in the next few pages including adequate sample sizes and sample selection for population databases.

Was this article helpful?

0 0
Stammering Its Cause and Its Cure

Stammering Its Cause and Its Cure

This book discusses the futility of curing stammering by common means. It traces various attempts at curing stammering in the past and how wasteful these attempt were, until he discovered a simple program to cure it. The book presents the life of Benjamin Nathaniel Bogue and his struggles with the handicap. Bogue devotes a great deal of text to explain the handicap of stammering, its effects on the body and psychology of the sufferer, and its cure.

Get My Free Ebook


Post a comment