Ieqh Vkea Iekf Iekf Ittv Ittv

the signature, although the sequences of the insertions are quite distinct for the two groups. The variation in the signature sequence reflects the significant evolutionary divergence that has occurred at this site since it first appeared in a common ancestor of both groups.

overrepresented, which limits the usefulness of the matrix in identifying homologs that are somewhat distantly related. Tests have shown that the Blosum62 table provides the most reliable alignments over a wide range of protein families, and it is the default table in many sequence alignment programs.

For most efforts to find homologies and explore evolutionary relationships, protein sequences (derived either directly from protein sequencing or from the sequencing of the DNA encoding the protein) are superior to nongenic nucleic acid sequences (those that do not encode a protein or functional RNA). For a nucleic acid, with its four different types of residues, random alignment of nonhomologous sequences will generally yield matches for at least 25% of the positions. Introduction of a few gaps can often increase the fraction of matched residues to 40% or more, and the probability of chance alignment of unrelated sequences becomes quite high. The 20 different amino acid residues in proteins greatly lower the probability of uninformative chance alignments of this type.

The programs used to generate a sequence alignment are complemented by methods that test the reliability of the alignments. A common computerized test is to shuffle the amino acid sequence of one of the proteins being compared to produce a random sequence, then instruct the program to align the shuffled sequence with the other, unshuffled one. Scores are assigned to the new alignment, and the shuffling and alignment process is repeated many times. The original alignment, before shuffling, should have a score significantly higher than any of those within the distribution of scores generated by the random alignments; this increases the confidence that the sequence alignment has identified a pair of homologs. Note that the absence of a significant alignment score does not necessarily mean that no evolutionary relationship exists between two proteins. As we shall see in Chapter 4, three-dimensional structural similarities sometimes reveal evolutionary relationships where sequence homology has been wiped away by time.

Using a protein family to explore evolution requires the identification of family members with similar molecular functions in the widest possible range of organ-

isms. Information from the family can then be used to trace the evolution of those organisms. By analyzing the sequence divergence in selected protein families, investigators can segregate organisms into classes based on their evolutionary relationships. This information must be reconciled with more classical examinations of the physiology and biochemistry of the organisms.

Certain segments of a protein sequence may be found in the organisms of one taxonomic group but not in other groups; these segments can be used as signature sequences for the group in which they are found. An example of a signature sequence is an insertion of 12 amino acids near the amino terminus of the EF-1a/EF-Tu proteins in all archaebacteria and eukaryotes but not in other types of bacteria (Fig. 3-32). The signature is one of many biochemical clues that can help establish the evolutionary relatedness of eukaryotes and archaebacteria. For example, the major taxa of bacteria can be distinguished by signature sequences in several different proteins. The 3 and y proteobacteria have signature sequences in the Hsp70 and DNA gyrase protein families (families of proteins involved in protein folding and DNA replication, respectively) that are not present in any other bacteria, including the other proteobacte-ria. The other types of proteobacteria (a, 8, s), along with the 3 and y proteobacteria, have a separate Hsp70 signature sequence and a signature in alanyl-tRNA syn-thetase (an enzyme of protein synthesis) that are not present in other bacteria. The appearance of unique signatures in the 3 and y proteobacteria suggests the a, 8, and s proteobacteria arose before their 3 and y cousins.

By considering the entire sequence of a protein, researchers can now construct more elaborate evolutionary trees with many species in each taxonomic group. Figure 3-33 presents one such tree for bacteria, based on sequence divergence in the protein GroEL (a protein present in all bacteria that assists in the proper folding of proteins). The tree can be refined by basing it on the sequences of multiple proteins and by supplementing the sequence information with data on the unique biochemical and physiological properties of each species. There are many methods for generating trees, each with its own advantages and shortcomings, and many ways to represent the resulting evolutionary relationships. In Figure 3-33, the free end points of lines are called "external nodes"; each represents an extant species, and each is so labeled. The points where two lines come together, the "internal nodes," represent extinct ancestor species. In most representations (including Fig. 3-33), the lengths of the lines connecting the nodes are proportional to the number of amino acid substitutions separating one species from another. If we trace two extant species to a common internal node (representing the common ancestor of the two species), the length of the branch connecting each external node to the internal node represents the number of amino acid substitutions separating one extant species from this ancestor. The sum of the lengths of all the line segments that connect an extant species to another extant species through a common ancestor reflects the number of substitutions separating the two extant species. To determine how much time was needed for the various species to diverge, the tree must be calibrated by comparing it with information from the fossil record and other sources.

As more sequence information is made available in databases, we can generate evolutionary trees based on a variety of different proteins. Some proteins evolve faster than others, or change faster within one group of species than another. A large protein, with many vari able amino acid residues, may exhibit a few differences between two closely related species. Another, smaller protein may be identical in the same two species. For many reasons, some details of an evolutionary tree based on the sequences of one protein may differ from those of a tree based on the sequences of another protein. Increasingly sophisticated analyses using the sequences of many different proteins can provide an exquisitely detailed and accurate picture of evolutionary relationships. The story is a work in progress, and the questions being asked and answered are fundamental to how humans view themselves and the world around them. The field of molecular evolution promises to be among the most vibrant of the scientific frontiers in the twenty-first century.

SUMMARY 3.5 Protein Sequences and Evolution

■ Protein sequences are a rich source of information about protein structure and function, as well as the evolution of life on this planet. Sophisticated methods are being developed to trace evolution by analyzing the resultant slow changes in the amino acid sequences of homologous proteins.

Chlamydia

Chlamydia trachomatis

Borrelia burgdorferi a nQ

Chlamydia psittaci Bacteroides C Porphyromonas gingivalis

ô/e H Helicobacter pylori

Legionella pneumophila Pseudomonas aeruginosa

Yersinia enterocolitica Salmonella typhi Escherichia coli

Rickettsia tsutsugamushi

0.1 substitutions/site

Chlamydia

Chlamydia trachomatis

Borrelia burgdorferi

Chlamydia psittaci Bacteroides C Porphyromonas gingivalis

Yersinia enterocolitica Salmonella typhi Escherichia coli

Rickettsia tsutsugamushi

0.1 substitutions/site

Agrobacterium tumefaciens

Zymomonas mobilis

Spirochaetes communis

FIGURE 3-33 Evolutionary tree derived from amino acid sequence comparisons. A bacterial evolutionary tree, based on the sequence divergence observed in the GroEL family of proteins. Also included in this tree (lower right) are the chloroplasts (chl.) of some nonbacterial species.

Spirochaetes

Thermophilic bacterium PS-3

Bacillus subtilis

Staphylococcus aureus

Clostridium acetobutylicum Clostridium perfringens

Streptomyces coelicolor

Mycobacterium leprae Mycobacterium tuberculosis

Streptomyces albus [gene]

Agrobacterium tumefaciens

Zymomonas mobilis

Cyanidium caldarium chl. Synechocystis communis chl.

Triticum aestivum chl. Brassica napus chl. Arabidopsis thaliana chl.

Quick Permanent Weight Loss

Quick Permanent Weight Loss

A Step By Step Guide To Fast Fat Loss. Do you ever feel like getting rid of the extra weight of your body? If you do, it‟s quite normal because

Get My Free Ebook


Post a comment