H. sapiens (completed)

S. pombe

Haemophilus influenzae), yeast (S. cerevisiae), a nematode worm (C. elegans), the fruit fly (D. melanogaster), and a plant (A. thaliana). Completed sequences for mammalian genomes, including the human genome, began to emerge in 2000. Each genome project has a website that serves as a central repository for the latest data.

to provide high-quality sequence data that are contiguous throughout the genome.

The Human Genome Project marks the culmination of twentieth-century biology and promises a vastly changed scientific landscape for the new century. The human genome is only part of the story, as the genomes of many other species are also being (or have been) se-quenced, including the yeasts Saccharomyces cerevisiae (completed in 1996) and Schizosaccharomyces pombe (2002), the nematode Caenorhabditis elegans (1998), the fruit fly Drosophila melanogaster (2000), the plant Arabidopsis thaliana (2000), the mouse Mus musculus (2002), zebrafish, and dozens of bacterial and archaebacterial species (Fig. 9-18). Most of the early efforts have been focused on species commonly used in laboratories. However, genome sequencing is destined to branch out to many other species as experience grows and technology improves. Broad efforts to map genes, attempts to identify new proteins and disease genes, and many other initiatives are currently under way.

The result is a database with the potential not only to fuel rapid advances in biology but to change the way that humans think about themselves. Early insights provided by the human genome sequence range from the intriguing to the profound. We are not as complicated as we thought. Decades-old estimates that humans possessed about 100,000 genes within the approximately 3.2 X 109 bp in the human genome have been supplanted by the discovery that we have only 30,000 to 35,000 genes. This is perhaps three times more genes than a fruit fly (with 13,000) and twice as many as a nematode worm (18,000). Although humans evolved relatively recently, the human genome is very old. Of 1,278 protein families identified in one early screen, only 94 were unique to vertebrates. However, while we share many protein domain types with plants, worms, and flies, we use these domains in more complex arrangements. Alternative modes of gene expression (Chapter 26) allow the production of more than one protein from a single gene—a process that humans and other vertebrates engage in more than do bacteria, worms, or any other forms of life. This allows for greater complexity in the proteins generated from our gene complement.

We now know that only 1.1% to 1.4% of our DNA actually encodes proteins (Fig. 9-19). More than 50% of our genome consists of short, repeated sequences, the vast majority of which—about 45% of our genome in all—come from transposons, short movable DNA sequences that are molecular parasites (Chapter 25). Many of the transposons have been there a long time, now altered so that they can no longer move to new genomic locations. Others are still actively moving at low frequencies, helping to make the genome an ever-dynamic and evolving entity. At least a few transposons have been co-opted by their host and appear to serve useful cellular functions.

What does all this information tell us about how much one human differs from another? Within the human population are millions of single-base differences, called single nucleotide polymorphisms, or SNPs (pronounced "snips"). Each human differs from the next

Translated into protein 1.1%-1.4%

Translated into protein 1.1%-1.4%

FIGURE 9-19 Snapshot of the human genome. The chart shows the proportions of our genome made up of various types of sequences.

FIGURE 9-19 Snapshot of the human genome. The chart shows the proportions of our genome made up of various types of sequences.

by about 1 bp in every 1,000 bp. From these small genetic differences arises the human variety we are all aware of—differences in hair color, eyesight, allergies to medication, foot size, and even (to some unknown degree) behavior. Some of the SNPs are linked to particular human populations and can provide important information about human migrations that occurred thousands of years ago and about our more distant evolutionary past.

As spectacular as this advance is, the sequencing of the human genome is easy compared with what comes next—the effort to understand all the information in each genome. The genome sequences being added monthly to international databases are roadmaps, parts of which are written in a language we do not yet understand. However, they have great utility in catalyzing the discovery of new proteins and processes affecting every aspect of biochemistry, as will become apparent in chapters to come.

SUMMARY 9.2 From Genes to Genomes

■ The science of genomics broadly encompasses the study of genomes and their gene content.

■ Genomic DNA segments can be organized in libraries—such as genomic libraries and cDNA libraries—with a wide range of designs and purposes.

■ The polymerase chain reaction (PCR) can be used to amplify selected DNA segments from a DNA library or an entire genome.

■ In an international cooperative research effort, the genomes of many organisms, including that of humans, have been sequenced in their entirety and are now available in public databases.

9.3 From Genomes to Proteomes

A gene is not simply a DNA sequence; it is information that is converted to a useful product—a protein or functional RNA molecule—when and if needed by the cell. The first and most obvious step in exploring a large se-quenced genome is to catalog the products of the genes within that genome. Genes that encode RNA as their final product are somewhat harder to identify than are protein-encoding genes, and even the latter can be very difficult to spot in a vertebrate genome. The explosion of DNA sequence information has also revealed a sobering truth. Despite many years of biochemical advances, there are still thousands of proteins in every eukaryotic cell (and quite a few in bacteria) that we know nothing about. These proteins may have functions in processes not yet discovered, or may contribute in unexpected ways to processes we think we understand. In addition, the genomic sequences tell us nothing about the three-dimensional structure of proteins or how proteins are modified after they are synthesized. The proteins, with their myriad critical functions in every cell, are now becoming the focus of new strategies for whole cell biochemistry.

The complement of proteins expressed by a genome is called its proteome, a term that first appeared in the research literature in 1995. This concept rapidly evolved into a separate field of investigation, called proteomics. The problem addressed by proteomics research is straightforward, although the solution is not. Each genome presents us with thousands of genes encoding proteins, and ideally we want to know the structure and function of all those proteins. Given that many proteins offer surprises even after years of study, the investigation of an entire proteome is a daunting enterprise. Simply discovering the function of new proteins requires intensive work. Biochemists can now apply shortcuts in the form of a broad array of new and updated technologies.

Protein function can be described on three levels. Phenotypic function describes the effects of a protein on the entire organism. For example, the loss of the protein may lead to slower growth of the organism, an altered development pattern, or even death. Cellular function is a description of the network of interactions engaged in by a protein at the cellular level. Interactions with other proteins in the cell can help define the kinds of metabolic processes in which the protein participates. Finally, molecular function refers to the precise biochemical activity of a protein, including details such as the reactions an enzyme catalyzes or the ligands a receptor binds.

For several genomes, such as those of the yeast Sac-charomyces cerevisiae and the plant Arabidopsis, a massive effort is underway to inactivate each gene by genetic engineering and to investigate the effect on the organism. If the growth patterns or other properties of the organism change (or if it does not grow at all), this provides information on the phenotypic function of the protein product of the gene.

There are three other main paths to investigating protein function: (1) sequence and structural comparisons with genes and proteins of known function, (2) determination of when and where a gene is expressed, and (3) investigation of the interactions of the protein with other proteins. We discuss each of these approaches in turn.

Sequence or Structural Relationships Provide Information on Protein Function

One of the important reasons to sequence many genomes is to provide a database that can be used to assign gene functions by genome comparisons, an enterprise referred to as comparative genomics. Sometimes a newly discovered gene is related by sequence homologies to a gene previously studied in another or the same species, and its function can be entirely or partly defined by that relationship. Such genes—of different species but possessing a clear sequence and functional relationship to each other—are called orthologs. Genes similarly related to each other within a single species are called paralogs (see Fig. 1-37). If the function of a gene has been characterized for one species, this information can be used to assign gene function to the ortholog found in the second species. The identity is easiest to make when comparing genomes from relatively closely related species, such as mouse and human, although many clearly orthologous genes have been identified in species as distant as bacteria and humans. Sometimes even the order of genes on a chromosome is conserved over large segments of the genomes of closely related species (Fig. 9-20). Conserved gene order, called syn-teny, provides additional evidence for an orthologous relationship between genes at identical locations within the related segments.

Alternatively, certain sequences associated with particular structural motifs (Chapter 4) may be identified within a protein. The presence of a structural motif may suggest that it, say, catalyzes ATP hydrolysis, binds to DNA, or forms a complex with zinc ions, helping to define molecular function. These relationships are determined with the aid of increasingly sophisticated computer programs, limited only by the current information on gene and protein structure and our capacity to associate sequences with particular structural motifs.

Human 9

Diabetes 2

Diabetes 2

Diabetes is a disease that affects the way your body uses food. Normally, your body converts sugars, starches and other foods into a form of sugar called glucose. Your body uses glucose for fuel. The cells receive the glucose through the bloodstream. They then use insulin a hormone made by the pancreas to absorb the glucose, convert it into energy, and either use it or store it for later use. Learn more...

Get My Free Ebook

Post a comment