Protein Sequences and Evolution

The simple string of letters denoting the amino acid sequence of a given protein belies the wealth of information this sequence holds. As more protein sequences have become available, the development of more powerful methods for extracting information from them has become a major biochemical enterprise. Each protein's function relies on its three-dimensional structure, which in turn is determined largely by its primary structure. Thus, the biochemical information conveyed by a protein sequence is in principle limited only by our own understanding of structural and functional principles. On a different level of inquiry, protein sequences are beginning to tell us how the proteins evolved and, ultimately, how life evolved on this planet.

Protein Sequences Can Elucidate the History of Life on Earth

The field of molecular evolution is often traced to Emile Zuckerkandl and Linus Pauling, whose work in the mid-1960s advanced the use of nucleotide and protein sequences to explore evolution. The premise is deceptively straightforward. If two organisms are closely related, the sequences of their genes and proteins should be similar. The sequences increasingly diverge as the evolutionary distance between two organisms increases. The promise of this approach began to be realized in the 1970s, when Carl Woese used ribosomal RNA sequences to define archaebacteria as a group of living organisms distinct from other bacteria and eukaryotes (see Fig. 1-4). Protein sequences offer an opportunity to greatly refine the available information. With the advent of genome projects investigating organisms from bacteria to humans, the number of available sequences is growing at an enormous rate. This information can be used to trace biological history. The challenge is in learning to read the genetic hieroglyphics.

Evolution has not taken a simple linear path. Complexities abound in any attempt to mine the evolutionary information stored in protein sequences. For a given protein, the amino acid residues essential for the activity of the protein are conserved over evolutionary time. The residues that are less important to function may vary over time—that is, one amino acid may substitute for another—and these variable residues can provide the information used to trace evolution. Amino acid substitutions are not always random, however. At some positions in the primary structure, the need to maintain protein function may mean that only particular amino acid substitutions can be tolerated. Some proteins have more variable amino acid residues than others. For these and other reasons, proteins can evolve at different rates.

Another complicating factor in tracing evolutionary history is the rare transfer of a gene or group of genes from one organism to another, a process called lateral gene transfer. The transferred genes may be quite simE. coli TGNRT I AVYDLGGGTFD I S II E ID B. subtilis DEDQTILLYDLGGGTFDVS ILELG

FIGURE 3-30 Aligning protein sequences with the use of gaps.

Shown here is the sequence alignment of a short section of the EF-Tu protein from two well-studied bacterial species, E. coli and Bacillus ilar to the genes they were derived from in the original organism, whereas most other genes in the same two organisms may be quite distantly related. An example of lateral gene transfer is the recent rapid spread of antibiotic-resistance genes in bacterial populations. The proteins derived from these transferred genes would not be good candidates for the study of bacterial evolution, because they share only a very limited evolutionary history with their "host" organisms.

The study of molecular evolution generally focuses on families of closely related proteins. In most cases, the families chosen for analysis have essential functions in cellular metabolism that must have been present in the earliest viable cells, thus greatly reducing the chance that they were introduced relatively recently by lateral gene transfer. For example, a protein called EF-1a (elongation factor 1a) is involved in the synthesis of proteins in all eukaryotes. A similar protein, EF-Tu, with the same function, is found in bacteria. Similarities in sequence and function indicate that EF-1a and EF-Tu are members of a family of proteins that share a common ancestor. The members of protein families are called homologous proteins, or homologs. The concept of a homolog can be further refined. If two proteins within a family (that is, two homologs) are present in the same species, they are referred to as paralogs. Homologs from different species are called orthologs (see Fig. 1-37). The process of tracing evolution involves first identifying suitable families of homologous proteins and then using them to reconstruct evolutionary paths.

Homologs are identified using increasingly powerful computer programs that can directly compare two or more chosen protein sequences, or can search vast databases to find the evolutionary relatives of one selected protein sequence. The electronic search process can be thought of as sliding one sequence past the other until a section with a good match is found. Within this sequence alignment, a positive score is assigned for each position where the amino acid residues in the two sequences are identical—the value of the score varying from one program to the next—to provide a measure of the quality of the alignment. The process has some complications. Sometimes the proteins being compared match well at, say, two sequence segments, and these segments are connected by less related sequences of different lengths. Thus the two matching segments cannot be aligned at the same time. To handle this, the computer program introduces "gaps" in one of the sequences to bring the matching segments into register (Fig. 3-30).

Quick Permanent Weight Loss

Quick Permanent Weight Loss

A Step By Step Guide To Fast Fat Loss. Do you ever feel like getting rid of the extra weight of your body? If you do, it‟s quite normal because

Get My Free Ebook

Post a comment