## General Match Probability

As noted in this entire section, profile probabilities need to be calculated for a variety of scenarios. Balding (1999) points out that there are five different sets of people and possible relationships to a suspect: (1) the suspect's siblings, (2) his other relatives, (3) other members of his sub-population, (4) other members of his racial group, and (5) anyone else outside of his population (e.g., racial) group (see also Foreman and Evett 2001, Weir 2003).

### Table 21.5

Example calculations with NRCII recommendations for population substructure adjustments (see Appendix VI). Scenarios with theta equal to 0.01 and 0.03 are examined.

### Table 21.5

Example calculations with NRCII recommendations for population substructure adjustments (see Appendix VI). Scenarios with theta equal to 0.01 and 0.03 are examined.

From U.S. Caucasian (N

=302); Appendix II - sample in database

Under HWE

NRCII Recommendation 4.1

NRCII Recommendation 4.10

A1

A2

Allele 1 freq (p)

Allele 2 freq (q)

Calc freq

e = 0.01

0=0.03

0=0.01

0=0.03

D13S317

11

14

0.33940

0.04801

2pq

0.0326

2pq

0.0326

0.0326

eq. 4.10b

0.0386

0.0504

TH01

6

6

0.23179

P2

0.0537

p2 + p(1

-p) e

0.0555

0.0591

eq. 4.10a

0.0628

0.0821

D18S51

14

16

0.13742

0.13907

2pq

0.0382

2pq

0.0382

0.0382

eq. 4.10b

0.0419

0.0493

D21S11

28

30

0.15894

0.27815

2pq

0.0884

2pq

0.0884

0.0884

eq. 4.10b

0.0927

0.1011

D3S1358

16

17

0.25331

0.21523

2pq

0.1090

2pq

0.1090

0.1090

eq. 4.10b

0.1129

0.1206

D5S818

12

13

0.38411

0.14073

2pq

0.1081

2pq

0.1081

0.1081

eq. 4.10b

0.1131

0.1228

D7S820

9

9

0.17715

p2

0.0314

p2 + p(1

-p) e

0.0328

0.0358

eq. 4.10a

0.0390

0.0556

D8S1179

12

14

0.18543

0.16556

2pq

0.0614

2pq

0.0614

0.0614

eq. 4.10b

0.0654

0.0733

CSF1PO

10

10

0.21689

p2

0.0470

p2 + p(1

-p) e

0.0487

0.0521

eq. 4.10a

0.0558

0.0744

FGA

21

22

0.18543

0.21854

2pq

0.0810

2pq

0.0810

0.0810

eq. 4.10b

0.0851

0.0930

D16S539

9

11

0.11258

0.32119

2pq

0.0723

2pq

0.0723

0.0723

eq. 4.10b

0.0773

0.0871

TPOX

8

8

0.53477

p2

0.2860

p2 + p(1

-p) e

0.2885

0.2934

eq. 4.10a

0.2983

0.3227

VWA

17

18

0.28146

0.20033

2pq

0.1128

2pq

0.1128

0.1128

eq. 4.10b

0.1167

0.1245

AMEL

X

Y

1.20E-15

1.35E-15

1.70E-15

3.92E-15

Example calculations with corrections for relatives using the NRCII recommended formula.

Table 21.6

Example calculations with corrections for relatives using the NRCII recommended formula.

 From U.S. Caucasian (N = 302); Appendix II - sample in database Under HWE NRCII Recommendation 4.4 A1 A2 Allele 1 freq (p) Allele 2 freq (q) Calc freq F = 1/4 (parent) F = 1/8 (half sib) F = 1/16 (1st cousin) Full sib D13S317 11 14 0.33940 0.04801 2pq 0.0326 eq. 4.8b 0.1937 0.1131 0.0729 eq. 4.9b 0.3550 TH01 6 6 0.23179 — P2 0.0537 eq. 4.8a 0.2318 0.1428 0.0982 eq. 4.9a 0.3793 D16S539 9 11 0.11258 0.32119 2pq 0.0723 eq. 4.8b 0.2169 0.1446 0.1085 eq. 4.9b 0.3765 D18S51 14 16 0.13742 0.13907 2pq 0.0382 eq. 4.8b 0.1382 0.0882 0.0632 eq. 4.9b 0.3287 D21S11 28 30 0.15894 0.27815 2pq 0.0884 eq. 4.8b 0.2185 0.1535 0.1209 eq. 4.9b 0.3814 D3S1358 16 17 0.25331 0.21523 2pq 0.1090 eq. 4.8b 0.2343 0.1717 0.1403 eq. 4.9b 0.3944 D5S818 12 13 0.38411 0.14073 2pq 0.1081 eq. 4.8b 0.2624 0.1853 0.1467 eq. 4.9b 0.4082 D7S820 9 9 0.17715 — p2 0.0314 eq. 4.8a 0.1772 0.1043 0.0678 eq. 4.9a 0.3464 D8S1179 12 14 0.18543 0.16556 2pq 0.0614 eq. 4.8b 0.1755 0.1184 0.0899 eq. 4.9b 0.3531 CSF1PO 10 10 0.21689 — p2 0.0470 eq. 4.8a 0.2169 0.1320 0.0895 eq. 4.9a 0.3702 FGA 21 22 0.18543 0.21854 2pq 0.0810 eq. 4.8b 0.2020 0.1415 0.1113 eq. 4.9b 0.3713 TPOX 8 8 0.53477 — p2 0.2860 eq. 4.8a 0.5348 0.4104 0.3482 eq. 4.9a 0.5889 VWA 17 18 0.28146 0.20033 2pq 0.1128 eq. 4.8b 0.2409 0.1768 0.1448 eq. 4.9b 0.3986 AMEL X Y 1.20E-15 3.17E-09 1.68E-11 3.74E-13 1 in 247616

Relationship Match probability formula

Homozygotes (AA

Full siblings (1+pf)2 + (7 + 7pf - 2p2)9 + (16 - 9pf+pf)&

Parent and child

Unrelated

Full siblings

Parent and child

Half siblings [29 + (1 -9)pf][2 + 49 + (1 -9)pf] 2(1 +9)(1+29)

First cousins [29 + (1 -9)pf][2 +119 + 3(1-9)pf] 4(1 + 9)(1 +29)

Heterozygotes (A,A^

(1 + pj+p! + 2p p) + (5 + 3p,+ 3pj- 4p,p,) 9 + 2(4 - 2p,- 2p,+p,p) 92 4(1 +9)(1+29)

Half siblings (pf+p; + 4pfp;) + (2 + 5pf + 5p; + 8pfpj) 9 + (8- 6p f - 6p,+4pfpj) 92

First cousins (pf+p-t + 12pfp/) + (2 + 13pf + 13p;-24pfpj) 9 + 2(8-7p -7p+ + 6pfp) 92

Unrelated 29 [+(1 -9)p,][9 + (1 -9)p;] (1 +9)(1+29)

Result with TH01 6,6 0.38921

0.24700

0.27479

0.10888

0.06283

Result with D13 11,14

0.35955

0.19977

0.11921

0.07893

0.03864 2pq = 0.03259

### Table 21.7

Effects of family relatedness on match probabilities (adapted from Weir 2003, p.839). Notice that the unrelated formulas are the same as those for NRC II recommendations 4.10a and 4.10b (see Appendix VI). Worked examples are using 6 = 0.01 and the allele frequencies found in Appendix II for Caucasians: p(TH01 allele 6) = 0.23179; p(D13 allele 11) = 0.33940; p(D13 allele 14) = 0.04801.

One solution to this is the use of general match probabilities that have been calculated from the theoretically most conservative method involving the most two common alleles for each locus (D.N.A. Box 21.2) (Foreman and Evett 2001). The primary advantage of this approach is that repeated calculations are not required for each profile observed. Another reason that Foreman and Evett (2001) advocate a general match probability that avoids case-specific calculations is that it is difficult to provide any sound statistical support for probabilities of such a small magnitude (e.g., 10-21).

### LIKELIHOOD RATIO

As noted previously, when matching STR profiles are obtained between a suspect (who then becomes the defendant in a court case; in other words the known reference sample, K) and the crime scene evidence (questioned sample, Q), it is necessary to quantify the evidentiary value of this match.

In a paper performing statistical analyses to support forensic interpretation of the 10-loci present in the SGM Plus kit, Foreman and Evett (2001) advocate the use of general probability values when reporting full matching STR profiles. With the 10 STR loci present in the SGM Plus kit used in the UK and Europe, the probabilities are as follows (see Foreman and Evett 2001, Table 4):

Relationship with suspect Match probability

Sibling 1 in 10000

Parent/child 1 in 1 million

Half-sibling or uncle/nephew 1 in 10 million

First cousin 1 in 100 million

### Unrelated 1 in 1 billion

They argue that adoption of such figures would eliminate the need to perform case-specific match probabilities making it much easier to present information to the court. The match probabilities for specific STR profiles are typically several orders of magnitude smaller than those given above, which were calculated from the theoretically most common SGM Plus profile. Thus, these probabilities should provide a fair and reasonable assessment of the weight of DNA evidence for each category and in the end would probably be favorable to the suspect (defendant).

A similar calculation for a full match with the 13 CODIS loci using the most common alleles observed in U.S. population databases, such as Appendix II, would result in even higher general match probability values since more STR loci are being examined.

Sources:

Foreman, L.A. and Evett, I.W. (2001) Statistical analyses to support forensic interpretation for a new 10-locus STR profiling system. International Journal of Legal Medicine, 114, 147-155. Balding, D.J. (1999) When can a DNA profile be regarded as unique? Science & Justice, 39, 257-260.

### D.N.A. Box 21.2 General match probability values

Another approach besides the match probability profile frequency estimate just described is the use of likelihood ratios (LR). LRs involve a comparison of the probabilities of the evidence under two alternative propositions. These mutually exclusive hypotheses represent the position of the prosecution - namely that the DNA from the crime scene originated from the suspect - and the position of the defense - that the DNA just happens to coincidently match the defendant and is instead from an unknown person out in the population at large.

A likelihood ratio is a ratio of two probabilities of the same evidence under different hypotheses. For example, if a DNA profile generated from a crime scene evidence sample matches a suspect's DNA profile, then there are generally two possible hypotheses for why the profiles match each other: (1) the suspect matches because he left his biological sample at the crime scene or (2) the true perpetrator is still at large and just happens to match the suspect at the DNA markers examined.

Typically the first hypothesis (and that championed by the prosecution) is placed in the numerator of the likelihood ratio while the second hypothesis -that someone else other than the defendant committed the crime (which is of course the defense's position) - is placed in the denominator.

Thus, in mathematical terms:

LR = Hp/Hd or verbally the likelihood ratio equals the hypothesis of the prosecution divided by the hypothesis of the defense. Since the hypothesis of the prosecution is that the defendant committed the crime, then Hp = 1 (assumes 100% probability). On the other hand the hypothesis of the defense that the profile originated from someone else can be calculated from the genotype frequency of the particular STR profile. If the STR typing result is heterozygous, then this probability would be 2pq, where p is the frequency of allele 1 and q is the frequency of allele 2 in the relevant population for the locus in question. Alternatively, for a homozygous STR type the Hd would be p2.

Therefore,

If the STR type in question was D13S317 alleles 11 and 14, then p is 0.3394 and q is 0.04801 for the Caucasian population (Appendix II). The likelihood ratio for the D13S317 genotype match then becomes

Hd 2pq 2(0.33940)(0.04801) 0.03259

Note that the rarer the particular STR genotype is, the higher the likelihood ratio will be since there is a reciprocal relationship. In its simplest form, a LR is the inverse of the estimated genotype frequency for each locus and if discrete alleles and independent marker systems are utilized, then the LR is simply the inverse of the relative frequency of the observed genotype in the relevant population. Of course, LRs can become much more complicated if mixtures or alternative scenarios for the evidence are possible (see Chapter 22, Table 22.1). The product of all locus-specific LRs results in the full profile LR, which in the example of the Caucasian data shown in Table 21.2 comes to 8.37 X 1014 (the inverse of 1.20 X 10-15).

If the value for a likelihood ratio is greater than one, then it provides support to the prosecution's case. If on the other hand, the LR is less than one, then the defense's case is supported. In the example shown here, if there is a match between a crime stain possessing D13S317 alleles 11 and 14 and the suspect who also possesses a D13S317 genotype of 11,14, then it is 30.7 times more likely if the suspect left the evidence than if it came from some unknown person out of the general Caucasian population.

When considering the strength of a likelihood ratio in terms of supporting the prosecution's position, the following guidelines have been suggested (Evett and Weir 1998, p. 226):

With a 13-locus STR match likelihood ratio of 8.37 X 1014 based on a full profile with unambiguous results (e.g., no mixture present), the evidence has extremely strong support from the proposition that the suspect supplied the evidentiary sample.

Given that DNA evidence can provide strong likelihood ratios and random match probabilities from forensic samples that exceed the world population many fold, the Federal Bureau of Investigation (FBI) Laboratory has adopted a source attribution policy (Budowle et al. 2000, DAB 2000). With average random match probabilities of less than one in a trillion using the 13 core STR loci (Chakraborty et al. 1999), there comes within the context of a particular case a high degree of confidence that an individual is the source of an evidentiary DNA sample with reasonable scientific certainty. The logic behind this source attribution policy is provided below.

If px is the random match probability for a given evidentiary profile X, then (1 — Px)N is the probability of not observing the particular profile in a sample of N unrelated individuals.

When this probability is greater than or equal to a 1 — a confidence level (with a being 0.01 for 99%), then (1 — px)N > 1 — a or px < 1 — (1 — a)1/N, which enables the calculation that if N is approximately the size of the U.S. population (N = 300 000 000), then a random match probably of less than 3.35 X 10—11 will confer at least 99% confidence that the evidentiary profile is unique in the population (Budowle et al. 2000). Table 21.8 lists the random match probability thresholds for various population sizes and confidence levels.

If likelihood ratio is... 1 to 10 10 to 100 100 to 1000

Then the evidence provides. limited support. moderate support. strong support. very strong support.

### 1000 and greater

Table 21.8 Random match probability thresholds for source attribution at various population sizes and confidence levels (adapted from Budowle et al. 2000). With a random match probability of1.20 x 10-15 in U.S. Caucasians (see Tables 21.2 and 21.5), the example STR profile would be considered 'unique.'

Sample Size (N)

Confidence Levels (1 - a)

World pop

10 25 50 100 1000 100 000 1 000 000 10 000 000 50 000 000 260 000 000 300 000 000 1 000 000 000 6 000 000 000

0.90

2.09 x 10-2 1.74 x 10-2 1.49 x 10-2 1.31 x 10-2 1.16 x 10-2 1.05 x 10-2 4.21 x 10-3

0.95

1.02 x 10-2 8.51 x 10-3 7.30 x 10-3 6.39 x 10-3 5.68 x 10-3

0.99

2.01 x 10-3 1.67 x 10-3 1.43 x 10-3 1.26 x 10-3 1.12 x 10-3

1.05 x 10-7 1.05 x 10-8 2.11 x 10-9 4.05 x 10-10 3.51 x 10-10 1.05 x 10-10 1.76 x 10-11

5.13 x 10-8 5.13 x 10-9 1.03 x 10-9 1.97 x 10-10 1.71 x 10-10 5.13 x 10-11 8.55 x 10-12

0.999

5.00 x 10-4 3.33 x 10-4 2.50 x 10-4 2.00 x 10-4 1.67 x 10-4 1.43 x 10-4 1.25 x 10-4 1.11 x 10-4 1.00 x 10-4 4.00 x 10-5 2.00 x 10-5 1.00 x 10-5 1.00 x 10-6

1.01 x 10-8 1.01 x 10-9 2.01 x 10-10 3.87 x 10-11 3.35 x 10-11 1.01 x 10-11 1.68 x 10-12

1.00 x 10-9 1.00 x 10-10 2.00 x 10-11 3.85 x 10-12 3.33 x 10-12 1.00 x 10-12 1.67 x 10-13

A statement provided with a report involving a source attribution might include the following words: 'In the absence of identical twins or close relatives, it can be concluded to a reasonable scientific certainty that the DNA from (x) and from came from the same individual' or 'Reasonable scientific certainty means that you are (x%) certain that you would not see this profile in a sample of ())) unrelated individuals.'

It should be pointed out that if the possibility exists that a close relative of the accused had access to the crime scene and may have been a contributor of the evidence, then the best action is to obtain a reference sample from the relative (DAB 2000). This scenario should be sufficient probable cause for obtaining a reference sample, typing it with the same STR markers as the evidence, and using this information to resolve the question of whether or not the relative carries the same DNA profile as the accused.

OTHER TOPICS OF INTEREST