Issues With Sequence Quality

Concerns with mtDNA database sequence quality and the impact that it might have on accurately estimating frequency estimates for random matches have been raised by Peter Forster and Hans Bandelt in several recent publications (Röhl et al. 2001, Bandelt et al. 2001, Bandelt et al. 2002, Forster 2003, Dennis 2003). Using a statistical analysis tool called phylogenetics, the similarities and differences between multiple and closely related DNA sequences (i.e., from the same region) can be compared systematically (see Wilson and Allard 2004). Sequence alignments are created and compared to look for samples that are extremely different. Extreme or unusual differences may be an indication that the sample was contaminated or the sequence data was incorrectly recorded. For example, a laboratory may put HV1 data for a sample with another sample's HV2 sequence and thereby create an artificial recombinant or accidental composite sequence. Thus phylogenetic analyses can play a role in verifying sequence quality (Bandelt et al. 2001, Wilson and Allard 2004).

Errors that can creep into mitochondrial DNA population databases can be segregated into four different classes (Parson et al. 2004): (1) mistakes in the course of transcription of the results (i.e., clerical errors); (2) sample mix-up

(e.g., putting data from HV1 on one sample together with data from HV2 on another sample); (3) contamination; and (4) use of different nomenclatures.

From a pilot collaborative study of 21 laboratories, 14 non-concordant haplotypes (16 individuals errors) were observed out of a total of 150 submitted samples/haplotypes representing the examination of approximately 150 000 nucleotides (Parson et al. 2004). Measures are being put into place for complete electronic transfer of data and base calling to avoid the primary problem of clerical errors when transferring information from raw sequence data to final report. In the future, mtDNA databases may require retention of raw data for population samples in order to more easily verify authenticity of results should an inquiry into the origin of sequence results be needed at a later date (Parson et al. 2004).


