76
Only when the structure of DNA was discovered in the early 1950s did it become clear how the hereditary information in cells is encoded in DNA’s sequence of nucleotides. The progress since then has been astounding. Fifty years later, we have complete genome sequences for many organisms, including humans, and we therefore know the maximum amount of information that is required to pro- duce a complex organism like ourselves. The limits on the hereditary informa- tion needed for life constrain the biochemical and structural features of cells and make it clear that biology is not infinitely complex. In this chapter, we explain how cells decode and use the information in their genomes. We shall see that much has been learned about how the genetic instructions written in an alphabet of just four “letters”—the four different nucleotides in DNA—direct the formation of a bacterium, a fruitfly, or a human. Nevertheless, we still have a great deal to discover about how the information stored in an organism’s genome produces even the simplest unicellular bacterium with 500 genes, let alone how it directs the development of a human with approximately 30,000 genes. An enormous amount of ignorance remains; many fascinating challenges therefore await the next generation of cell biologists. The problems cells face in decoding genomes can be appreciated by consid- ering a small portion of the genome of the fruit fly Drosophila melanogaster (Fig- ure 6–1). Much of the DNA-encoded information present in this and other genomes is used to specify the linear order—the sequence—of amino acids for every protein the organism makes. As described in Chapter 3, the amino acid sequence in turn dictates how each protein folds to give a molecule with a dis- tinctive shape and chemistry. When a particular protein is made by the cell, the corresponding region of the genome must therefore be accurately decoded. Addi- tional information encoded in the DNA of the genome specifies exactly when in the life of an organism and in which cell types each gene is to be expressed into protein. Since proteins are the main constituents of cells, the decoding of the genome determines not only the size, shape, biochemical properties, and behav- ior of cells, but also the distinctive features of each species on Earth. One might have predicted that the information present in genomes would be arranged in an orderly fashion, resembling a dictionary or a telephone directory. HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA TO PROTEIN THE RNA WORLD AND THE ORIGINS OF LIFE 299

HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Only when the structure of DNA was discovered in the early 1950s did it becomeclear how the hereditary information in cells is encoded in DNA’s sequence ofnucleotides. The progress since then has been astounding. Fifty years later, wehave complete genome sequences for many organisms, including humans, andwe therefore know the maximum amount of information that is required to pro-duce a complex organism like ourselves. The limits on the hereditary informa-tion needed for life constrain the biochemical and structural features of cellsand make it clear that biology is not infinitely complex.

In this chapter, we explain how cells decode and use the information in theirgenomes. We shall see that much has been learned about how the geneticinstructions written in an alphabet of just four “letters”—the four differentnucleotides in DNA—direct the formation of a bacterium, a fruitfly, or a human.Nevertheless, we still have a great deal to discover about how the informationstored in an organism’s genome produces even the simplest unicellular bacteriumwith 500 genes, let alone how it directs the development of a human withapproximately 30,000 genes. An enormous amount of ignorance remains; manyfascinating challenges therefore await the next generation of cell biologists.

The problems cells face in decoding genomes can be appreciated by consid-ering a small portion of the genome of the fruit fly Drosophila melanogaster (Fig-ure 6–1). Much of the DNA-encoded information present in this and othergenomes is used to specify the linear order—the sequence—of amino acids forevery protein the organism makes. As described in Chapter 3, the amino acidsequence in turn dictates how each protein folds to give a molecule with a dis-tinctive shape and chemistry. When a particular protein is made by the cell, thecorresponding region of the genome must therefore be accurately decoded. Addi-tional information encoded in the DNA of the genome specifies exactly when inthe life of an organism and in which cell types each gene is to be expressed intoprotein. Since proteins are the main constituents of cells, the decoding of thegenome determines not only the size, shape, biochemical properties, and behav-ior of cells, but also the distinctive features of each species on Earth.

One might have predicted that the information present in genomes would bearranged in an orderly fashion, resembling a dictionary or a telephone directory.

HOW CELLS READ THEGENOME: FROM DNA TO PROTEIN

6FROM DNA TO RNA

FROM RNA TO PROTEIN

THE RNA WORLD AND THE ORIGINS OF LIFE

299

Page 2: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

300 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

color code for sequence similarityof genes identified

MWY

MW

MY

M

WY

W

Y

no similarityto MWY

M = mammalianW = C.elegansY = S. cerevisiae

known and predicted genesidentified on bottom strand of DNA

known and predicted genesidentified on top strand of DNA

transposable elements

%GC content6525

13 or more1 to 12none

length of bar indicates numberof correspondingcDNAs identifiedin databases

KEY:

100,000 nucleotide pairs

Page 3: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Although the genomes of some bacteria seem fairly well organized, the genomesof most multicellular organisms, such as our Drosophila example, are surpris-ingly disorderly. Small bits of coding DNA (that is, DNA that codes for protein)are interspersed with large blocks of seemingly meaningless DNA. Some sectionsof the genome contain many genes and others lack genes altogether. Proteinsthat work closely with one another in the cell often have their genes located ondifferent chromosomes, and adjacent genes typically encode proteins that havelittle to do with each other in the cell. Decoding genomes is therefore no simplematter. Even with the aid of powerful computers, it is still difficult for researchersto locate definitively the beginning and end of genes in the DNA sequences ofcomplex genomes, much less to predict when each gene is expressed in the lifeof the organism. Although the DNA sequence of the human genome is known, itwill probably take at least a decade for humans to identify every gene and deter-mine the precise amino acid sequence of the protein it produces. Yet the cells inour body do this thousands of times a second.

The DNA in genomes does not direct protein synthesis itself, but insteaduses RNA as an intermediary molecule. When the cell needs a particular protein,the nucleotide sequence of the appropriate portion of the immensely long DNAmolecule in a chromosome is first copied into RNA (a process called transcrip-tion). It is these RNA copies of segments of the DNA that are used directly astemplates to direct the synthesis of the protein (a process called translation).The flow of genetic information in cells is therefore from DNA to RNA to protein(Figure 6–2). All cells, from bacteria to humans, express their genetic informa-tion in this way—a principle so fundamental that it is termed the central dogmaof molecular biology.

Despite the universality of the central dogma, there are important variationsin the way information flows from DNA to protein. Principal among these is thatRNA transcripts in eucaryotic cells are subject to a series of processing steps inthe nucleus, including RNA splicing, before they are permitted to exit from thenucleus and be translated into protein. These processing steps can criticallychange the “meaning” of an RNA molecule and are therefore crucial for under-standing how eucaryotic cells read the genome. Finally, although we focus onthe production of the proteins encoded by the genome in this chapter, we seethat for some genes RNA is the final product. Like proteins, many of these RNAsfold into precise three-dimensional structures that have structural and catalyticroles in the cell.

We begin this chapter with the first step in decoding a genome: the processof transcription by which an RNA molecule is produced from the DNA of a gene.We then follow the fate of this RNA molecule through the cell, finishing when acorrectly folded protein molecule has been formed. At the end of the chapter, weconsider how the present, quite complex, scheme of information storage, tran-scription, and translation might have arisen from simpler systems in the earlieststages of cellular evolution.

HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 301

Figure 6–1 (opposite page) Schematic depiction of a portion of chromosome 2 from the genome ofthe fruit fly Drosophila melanogaster.This figure represents approximately 3% of the total Drosophila genome,arranged as six contiguous segments.As summarized in the key, the symbolic representations are: rainbow-coloredbar: G–C base-pair content; black vertical lines of various thicknesses: locations of transposable elements, withthicker bars indicating clusters of elements; colored boxes: genes (both known and predicted) coded on one strandof DNA (boxes above the midline) and genes coded on the other strand (boxes below the midline).The length ofeach predicted gene includes both its exons (protein-coding DNA) and its introns (non-coding DNA) (see Figure4–25).As indicated in the key, the height of each gene box is proportional to the number of cDNAs in variousdatabases that match the gene.As described in Chapter 8, cDNAs are DNA copies of mRNA molecules, andlarge collections of the nucleotide sequences of cDNAs have been deposited in a variety of databases.The higherthe number of matches between the nucleotide sequences of cDNAs and that of a particular predicted gene, thehigher the confidence that the predicted gene is transcribed into RNA and is thus a genuine gene.The color ofeach gene box (see color code in the key) indicates whether a closely related gene is known to occur in otherorganisms. For example, MWY means the gene has close relatives in mammals, in the nematode wormCaenorhabditis elegans, and in the yeast Saccharomyces cerevisiae. MW indicates the gene has close relatives inmammals and the worm but not in yeast. (From Mark D.Adams et al., Science 287:2185–2195, 2000. © AAAS.)

Figure 6–2 The pathway from DNAto protein. The flow of geneticinformation from DNA to RNA(transcription) and from RNA to protein(translation) occurs in all living cells.

H2N COOH

5¢ 3¢

5¢3¢

5¢ 3¢

PROTEIN

RNA

DNA

amino acids

protein synthesis(translation)

RNA synthesis(transcription)

DNA replicationDNA repair

genetic recombination

Page 4: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

FROM DNA TO RNATranscription and translation are the means by which cells read out, or express,the genetic instructions in their genes. Because many identical RNA copies canbe made from the same gene, and each RNA molecule can direct the synthesisof many identical protein molecules, cells can synthesize a large amount ofprotein rapidly when necessary. But each gene can also be transcribed andtranslated with a different efficiency, allowing the cell to make vast quantities ofsome proteins and tiny quantities of others (Figure 6–3). Moreover, as we see inthe next chapter, a cell can change (or regulate) the expression of each of itsgenes according to the needs of the moment—most obviously by controllingthe production of its RNA.

Portions of DNA Sequence Are Transcribed into RNA

The first step a cell takes in reading out a needed part of its genetic instructionsis to copy a particular portion of its DNA nucleotide sequence—a gene—into anRNA nucleotide sequence. The information in RNA, although copied into anotherchemical form, is still written in essentially the same language as it is in DNA—the language of a nucleotide sequence. Hence the name transcription.

Like DNA, RNA is a linear polymer made of four different types of nucleotidesubunits linked together by phosphodiester bonds (Figure 6–4). It differs fromDNA chemically in two respects: (1) the nucleotides in RNA areribonucleotides—that is, they contain the sugar ribose (hence the name ribonu-cleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains thebases adenine (A), guanine (G), and cytosine (C), it contains the base uracil (U)instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen-bonding with A (Figure 6–5), the complementary base-pairing propertiesdescribed for DNA in Chapters 4 and 5 apply also to RNA (in RNA, G pairs withC, and A pairs with U). It is not uncommon, however, to find other types of basepairs in RNA: for example, G pairing with U occasionally.

Despite these small chemical differences, DNA and RNA differ quite dra-matically in overall structure. Whereas DNA always occurs in cells as a double-stranded helix, RNA is single-stranded. RNA chains therefore fold up into avariety of shapes, just as a polypeptide chain folds up to form the final shape ofa protein (Figure 6–6). As we see later in this chapter, the ability to fold into com-plex three-dimensional shapes allows some RNA molecules to have structuraland catalytic functions.

Transcription Produces RNA Complementary to One Strand of DNA

All of the RNA in a cell is made by DNA transcription, a process that has cer-tain similarities to the process of DNA replication discussed in Chapter 5.

302 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–3 Genes can be expressedwith different efficiencies. Gene A istranscribed and translated much moreefficiently than gene B.This allows theamount of protein A in the cell to bemuch greater than that of protein B.

A A A A A

A A A A A

A A A A A

A A A A A

A A A A A

B

DNA

gene A gene B

TRANSCRIPTION

TRANSLATION TRANSLATION

TRANSCRIPTION

RNA RNA

Page 5: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Transcription begins with the opening and unwinding of a small portion of theDNA double helix to expose the bases on each DNA strand. One of the twostrands of the DNA double helix then acts as a template for the synthesis of anRNA molecule. As in DNA replication, the nucleotide sequence of the RNA chainis determined by the complementary base-pairing between incomingnucleotides and the DNA template. When a good match is made, the incomingribonucleotide is covalently linked to the growing RNA chain in an enzymati-cally catalyzed reaction. The RNA chain produced by transcription—the tran-script—is therefore elongated one nucleotide at a time, and it has a nucleotidesequence that is exactly complementary to the strand of DNA used as the tem-plate (Figure 6–7).

Transcription, however, differs from DNA replication in several crucial ways.Unlike a newly formed DNA strand, the RNA strand does not remain hydrogen-bonded to the DNA template strand. Instead, just behind the region where theribonucleotides are being added, the RNA chain is displaced and the DNA helixre-forms. Thus, the RNA molecules produced by transcription are released fromthe DNA template as single strands. In addition, because they are copied fromonly a limited region of the DNA, RNA molecules are much shorter than DNAmolecules. A DNA molecule in a human chromosome can be up to 250 millionnucleotide-pairs long; in contrast, most RNAs are no more than a few thousandnucleotides long, and many are considerably shorter.

The enzymes that perform transcription are called RNA polymerases. Likethe DNA polymerase that catalyzes DNA replication (discussed in Chapter 5),RNA polymerases catalyze the formation of the phosphodiester bonds that linkthe nucleotides together to form a linear chain. The RNA polymerase movesstepwise along the DNA, unwinding the DNA helix just ahead of the active sitefor polymerization to expose a new region of the template strand for comple-mentary base-pairing. In this way, the growing RNA chain is extended by onenucleotide at a time in the 5¢-to-3¢ direction (Figure 6–8). The substrates arenucleoside triphosphates (ATP, CTP, UTP, and GTP); as for DNA replication, ahydrolysis of high-energy bonds provides the energy needed to drive the reac-tion forward (see Figure 5–4).

The almost immediate release of the RNA strand from the DNA as it is syn-thesized means that many RNA copies can be made from the same gene in a

FROM DNA TO RNA 303

OH2C

O OH

O

P

OH2C

O OH

O

P

OH2C

O OH

O

P

OHOCH2

H H

OH

H H

OH H

OHOCH2

H H

OH

H H

OH OH

OH2C

O OH

ribose

used in ribonucleicacid (RNA)

deoxyribose

used in deoxyribonucleicacid (DNA)

(A)

uracil

used in RNA

thymine

used in DNA

(B) O

C

C

HC

HC

NH

N

H

O

OO

CC

CHC

NH

N

H

H3C

O

O

O–O PC

A

U

G

3¢ end(C)

5¢ end

bases

ribose

O–O P

O–O P

O–O P

Figure 6–4 The chemical structure of RNA. (A) RNA contains thesugar ribose, which differs from deoxyribose, the sugar used in DNA, by thepresence of an additional –OH group. (B) RNA contains the base uracil,which differs from thymine, the equivalent base in DNA, by the absence of a–CH3 group. (C) A short length of RNA.The phosphodiester chemicallinkage between nucleotides in RNA is the same as that in DNA.

CC

CC

N

NO

uracil

O

H

N

N

N

N

CC

CC

HC

H

HNH

adenine

3¢5¢

5¢3¢

sugar-phosphate backbone

H

H

Figure 6–5 Uracil forms base pairs with adenine. The absence of amethyl group in U has no effect on base-pairing; thus, U–A base pairs closelyresemble T–A base pairs (see Figure 4–4).

Page 6: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

relatively short time, the synthesis of additional RNA molecules being startedbefore the first RNA is completed (Figure 6–9). When RNA polymerase moleculesfollow hard on each other’s heels in this way, each moving at about 20nucleotides per second (the speed in eucaryotes), over a thousand transcriptscan be synthesized in an hour from a single gene.

Although RNA polymerase catalyzes essentially the same chemical reactionas DNA polymerase, there are some important differences between the twoenzymes. First, and most obvious, RNA polymerase catalyzes the linkage ofribonucleotides, not deoxyribonucleotides. Second, unlike the DNA polymerasesinvolved in DNA replication, RNA polymerases can start an RNA chain withouta primer. This difference may exist because transcription need not be as accu-rate as DNA replication (see Table 5–1, p. 243). Unlike DNA, RNA does not per-manently store genetic information in cells. RNA polymerases make about onemistake for every 104 nucleotides copied into RNA (compared with an error ratefor direct copying by DNA polymerase of about one in 107 nucleotides), and theconsequences of an error in RNA transcription are much less significant thanthat in DNA replication.

Although RNA polymerases are not nearly as accurate as the DNA poly-merases that replicate DNA, they nonetheless have a modest proofreadingmechanism. If the incorrect ribonucleotide is added to the growing RNA chain,the polymerase can back up, and the active site of the enzyme can perform anexcision reaction that mimics the reverse of the polymerization reaction, exceptthat water instead of pyrophosphate is used (see Figure 5–4). RNA polymerasehovers around a misincorporated ribonucleotide longer than it does for a cor-rect addition, causing excision to be favored for incorrect nucleotides. However,RNA polymerase also excises many correct bases as part of the cost for improvedaccuracy.

Cells Produce Several Types of RNA

The majority of genes carried in a cell’s DNA specify the amino acid sequence ofproteins; the RNA molecules that are copied from these genes (which ultimatelydirect the synthesis of proteins) are called messenger RNA (mRNA) molecules.

304 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–7 DNA transcriptionproduces a single-stranded RNAmolecule that is complementary to one strand of DNA.

GGGA

CCCU

AGCUUAAA

UCGAAUUU

AUGCAU

UACGU

A

AAA

UUU

U A U GA

U A C

GC

AU

GC G

C

AUG

C

(A) (B) (C)

DNA

RNA

TRANSCRIPTION

template strand

5¢ 3¢

5¢ 3¢

5¢3¢

Figure 6–6 RNA can fold into specific structures. RNA is largely single-stranded, but it often contains shortstretches of nucleotides that can form conventional base-pairs with complementary sequences found elsewhereon the same molecule.These interactions, along with additional “nonconventional” base-pair interactions, allow anRNA molecule to fold into a three-dimensional structure that is determined by its sequence of nucleotides.(A) Diagram of a folded RNA structure showing only conventional base-pair interactions; (B) structure with bothconventional (red) and nonconventional (green) base-pair interactions; (C) structure of an actual RNA, a portion ofa group 1 intron (see Figure 6–36). Each conventional base-pair interaction is indicated by a “rung” in the doublehelix. Bases in other configurations are indicated by broken rungs.

Page 7: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

The final product of a minority of genes, however, is the RNA itself. Carefulanalysis of the complete DNA sequence of the genome of the yeast S. cerevisiaehas uncovered well over 750 genes (somewhat more than 10% of the total num-ber of yeast genes) that produce RNA as their final product, although this num-ber includes multiple copies of some highly repeated genes. These RNAs, likeproteins, serve as enzymatic and structural components for a wide variety ofprocesses in the cell. In Chapter 5 we encountered one of those RNAs, the tem-plate carried by the enzyme telomerase. Although not all of their functions areknown, we see in this chapter that some small nuclear RNA (snRNA) moleculesdirect the splicing of pre-mRNA to form mRNA, that ribosomal RNA (rRNA)molecules form the core of ribosomes, and that transfer RNA (tRNA) moleculesform the adaptors that select amino acids and hold them in place on a ribosomefor incorporation into protein (Table 6–1).

Each transcribed segment of DNA is called a transcription unit. In eucary-otes, a transcription unit typically carries the information of just one gene, andtherefore codes for either a single RNA molecule or a single protein (or group ofrelated proteins if the initial RNA transcript is spliced in more than one way toproduce different mRNAs). In bacteria, a set of adjacent genes is often trans-cribed as a unit; the resulting mRNA molecule therefore carries the informationfor several distinct proteins.

Overall, RNA makes up a few percent of a cell’s dry weight. Most of the RNAin cells is rRNA; mRNA comprises only 3–5% of the total RNA in a typical mam-malian cell. The mRNA population is made up of tens of thousands of differentspecies, and there are on average only 10–15 molecules of each species of mRNApresent in each cell.

FROM DNA TO RNA 305

Figure 6–8 DNA is transcribed bythe enzyme RNA polymerase. TheRNA polymerase (pale blue) movesstepwise along the DNA, unwinding theDNA helix at its active site.As itprogresses, the polymerase addsnucleotides (here, small “T” shapes) one byone to the RNA chain at thepolymerization site using an exposedDNA strand as a template.The RNAtranscript is thus a single-strandedcomplementary copy of one of the twoDNA strands.The polymerase has arudder (see Figure 6–11) that displacesthe newly formed RNA, allowing the twostrands of DNA behind the polymerase torewind.A short region of DNA/RNA helix(approximately nine nucleotides in length)is therefore formed only transiently, and a“window” of DNA/RNA helix thereforemoves along the DNA with thepolymerase.The incoming nucleotides arein the form of ribonucleosidetriphosphates (ATP, UTP, CTP, and GTP),and the energy stored in theirphosphate–phosphate bonds provides thedriving force for the polymerizationreaction (see Figure 5–4). (Adapted from afigure kindly supplied by Robert Landick.)

Figure 6–9 Transcription of two genes as observed under theelectron microscope. The micrograph shows many molecules of RNApolymerase simultaneously transcribing each of two adjacent genes.Molecules of RNA polymerase are visible as a series of dots along the DNAwith the newly synthesized transcripts (fine threads) attached to them.TheRNA molecules (ribosomal RNAs) shown in this example are not translatedinto protein but are instead used directly as components of ribosomes, themachines on which translation takes place.The particles at the 5¢ end (thefree end) of each rRNA transcript are believed to reflect the beginnings ofribosome assembly. From the lengths of the newly synthesized transcripts, itcan be deduced that the RNA polymerase molecules are transcribing fromleft to right. (Courtesy of Ulrich Scheer.)

direction oftranscription

RNA polymerase

ribonucleotidetriphosphates

ribonucleotidetriphosphatetunnel

active site

jaws in closedconfiguration

DNA doublehelix

short region ofDNA/RNA helix

flap inclosed

position

DNArewinding

newly synthesizedRNA transcript

RNA exitchannel5¢

5¢3¢

1 mm

Page 8: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Signals Encoded in DNA Tell RNA Polymerase Where to Start and Stop

To transcribe a gene accurately, RNA polymerase must recognize where on thegenome to start and where to finish. The way in which RNA polymerases per-form these tasks differs somewhat between bacteria and eucaryotes. Becausethe process in bacteria is simpler, we look there first.

The initiation of transcription is an especially important step in gene expres-sion because it is the main point at which the cell regulates which proteins areto be produced and at what rate. Bacterial RNA polymerase is a multisubunitcomplex. A detachable subunit, called sigma (s) factor, is largely responsible forits ability to read the signals in the DNA that tell it where to begin transcribing(Figure 6–10). RNA polymerase molecules adhere only weakly to the bacterialDNA when they collide with it, and a polymerase molecule typically slidesrapidly along the long DNA molecule until it dissociates again. However, whenthe polymerase slides into a region on the DNA double helix called a promoter,a special sequence of nucleotides indicating the starting point for RNA synthe-sis, it binds tightly to it. The polymerase, using its s factor, recognizes this DNAsequence by making specific contacts with the portions of the bases that areexposed on the outside of the helix (Step 1 in Figure 6–10).

After the RNA polymerase binds tightly to the promoter DNA in this way, itopens up the double helix to expose a short stretch of nucleotides on eachstrand (Step 2 in Figure 6–10). Unlike a DNA helicase reaction (see Figure 5–15),this limited opening of the helix does not require the energy of ATP hydrolysis.Instead, the polymerase and DNA both undergo reversible structural changesthat result in a more energetically favorable state. With the DNA unwound, oneof the two exposed DNA strands acts as a template for complementary base-pairing with incoming ribonucleotides (see Figure 6–7), two of which are joinedtogether by the polymerase to begin an RNA chain. After the first ten or sonucleotides of RNA have been synthesized (a relatively inefficient process dur-ing which polymerase synthesizes and discards short nucleotide oligomers), thes factor relaxes its tight hold on the polymerase and evenutally dissociates fromit. During this process, the polymerase undergoes additional structural changesthat enable it to move forward rapidly, transcribing without the s factor (Step 4in Figure 6–10). Chain elongation continues (at a speed of approximately 50nucleotides/sec for bacterial RNA polymerases) until the enzyme encounters asecond signal in the DNA, the terminator (described below), where the poly-merase halts and releases both the DNA template and the newly made RNAchain (Step 7 in Figure 6–10). After the polymerase has been released at a termi-nator, it reassociates with a free s factor and searches for a new promoter, whereit can begin the process of transcription again.

Several structural features of bacterial RNA polymerase make it particularlyadept at performing the transcription cycle just described. Once the s factor

306 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

TABLE 6–1 Principal Types of RNAs Produced in Cells

TYPE OF RNA FUNCTION

mRNAs messenger RNAs, code for proteins

rRNAs ribosomal RNAs, form the basic structure of the ribosome and catalyze protein synthesis

tRNAs transfer RNAs, central to protein synthesis as adaptorsbetween mRNA and amino acids

snRNAs small nuclear RNAs, function in a variety of nuclear processes, including the splicing of pre-mRNA

snoRNAs small nucleolar RNAs, used to process and chemically modify rRNAs

Other noncoding function in diverse cellular processes, including RNAs telomere synthesis, X-chromosome inactivation,

and the transport of proteins into the ER

Page 9: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

positions the polymerase on the promoter and the template DNA has beenunwound and pushed to the active site, a pair of moveable jaws is thought toclamp onto the DNA (Figure 6–11). When the first 10 nucleotides have beentranscribed, the dissociation of s allows a flap at the back of the polymerase to

FROM DNA TO RNA 307

Figure 6–11 The structure of a bacterial RNA polymerase. Two depictions of the three-dimensionalstructure of a bacterial RNA polymerase, with the DNA and RNA modeled in.This RNA polymerase isformed from four different subunits, indicated by different colors (right). The DNA strand used as a templateis red, and the non-template strand is yellow.The rudder wedges apart the DNA–RNA hybrid as thepolymerase moves. For simplicity only the polypeptide backbone of the rudder is shown in the right-handfigure, and the DNA exiting from the polymerase has been omitted. Because the RNA polymerase is depictedin the elongation mode, the s factor is absent. (Courtesy of Seth Darst.)

RNA

DNA

σ factor

RNA polymerase

promoter

1

2

3

4

5

6

7

RNARNA

Figure 6–10 The transcription cycleof bacterial RNA polymerase. In step1, the RNA polymerase holoenzyme (corepolymerase plus s factor) forms and thenlocates a promoter (see Figure 6–12).Thepolymerase unwinds the DNA at theposition at which transcription is to begin(step 2) and begins transcribing (step 3).This initial RNA synthesis (sometimescalled “abortive initiation”) is relativelyinefficient. However, once RNApolymerase has managed to synthesizeabout 10 nucleotides of RNA, s relaxes itsgrip, and the polymerase undergoes aseries of conformational changes (whichprobably includes a tightening of its jawsand the placement of RNA in the exitchannel [see Figure 6–11]).Thepolymerase now shifts to the elongationmode of RNA synthesis (step 4), movingrightwards along the DNA in this diagram.During the elongation mode (step 5)transcription is highly processive, with thepolymerase leaving the DNA template andreleasing the newly transcribed RNA onlywhen it encounters a termination signal(step 6).Termination signals are encodedin DNA and many function by forming anRNA structure that destabilizes thepolymerase’s hold on the RNA, as shownhere. In bacteria, all RNA molecules aresynthesized by a single type of RNApolymerase and the cycle depicted in thefigure therefore applies to the productionof mRNAs as well as structural andcatalytic RNAs. (Adapted from a figurekindly supplied by Robert Landick.)

rudder templateDNAstrand

newly synthesizedRNA transcript

flap

exit pathfor DNAdoublehelix

RNA in shortDNA/RNA helix

displaced non-template DNAstranddirection of transcription

site of nucleotideaddition

rudder

path ofdownstream

DNA helix

Page 10: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

close to form an exit tunnel through which the newly made RNA leaves theenzyme. With the polymerase now functioning in its elongation mode, a rudder-like structure in the enzyme continuously pries apart the DNA–RNA hybridformed. We can view the series of conformational changes that takes place dur-ing transcription initiation as a successive tightening of the enzyme around theDNA and RNA to ensure that it does not dissociate before it has finished tran-scribing a gene. If an RNA polymerase does dissociate prematurely, it cannotresume synthesis but must start over again at the promoter.

How do the signals in the DNA (termination signals) stop the elongatingpolymerase? For most bacterial genes a termination signal consists of a string ofA–T nucleotide pairs preceded by a two-fold symmetric DNA sequence, which,when transcribed into RNA, folds into a “hairpin” structure throughWatson–Crick base-pairing (see Figure 6–10). As the polymerase transcribesacross a terminator, the hairpin may help to wedge open the movable flap on theRNA polymerase and release the RNA transcript from the exit tunnel. At thesame time, the DNA–RNA hybrid in the active site, which is held together pre-dominantly by U–A base pairs (which are less stable than G–C base pairsbecause they form two rather than three hydrogen bonds per base pair), is notsufficiently strong enough to hold the RNA in place, and it dissociates causingthe release of the polymerase from the DNA, perhaps by forcing open its jaws.Thus, in some respects, transcription termination seems to involve a reversal ofthe structural transitions that happen during initiation. The process of termina-tion also is an example of a common theme in this chapter: the ability of RNA tofold into specific structures figures prominantly in many aspects of decoding thegenome.

Transcription Start and Stop Signals Are Heterogeneous in Nucleotide Sequence

As we have just seen, the processes of transcription initiation and terminationinvolve a complicated series of structural transitions in protein, DNA, and RNAmolecules. It is perhaps not surprising that the signals encoded in DNA thatspecify these transitions are difficult for researchers to recognize. Indeed, a com-parison of many different bacterial promoters reveals that they are heteroge-neous in DNA sequence. Nevertheless, they all contain related sequences,reflecting in part aspects of the DNA that are recognized directly by the s factor.These common features are often summarized in the form of a consensussequence (Figure 6–12). In general, a consensus nucleotide sequence is derived

308 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

T T G A C A T A T A A T15–19

nucleotides

freq

uen

cy o

f n

ucl

eoti

de

in e

ach

po

siti

on

(%

)

freq

uen

cy (

%)

50

25

0

0

50

2575

100

(A)

(B)

15 16 17 18 19spacing between –35 and –10 sequences

–35 –10

Figure 6–12 Consensus sequence forthe major class of E. coli promoters.(A) The promoters are characterized bytwo hexameric DNA sequences, the –35sequence and the –10 sequence namedfor their approximate location relative tothe start point of transcription (designated+1). For convenience, the nucleotidesequence of a single strand of DNA isshown; in reality the RNA polymeraserecognizes the promoter as double-stranded DNA. On the basis of acomparison of 300 promoters, thefrequencies of the four nucleotides at eachposition in the –35 and –10 hexamers aregiven.The consensus sequence, shownbelow the graph, reflects the mostcommon nucleotide found at eachposition in the collection of promoters.The sequence of nucleotides between the–35 and –10 hexamers shows nosignificant similarities among promoters.(B) The distribution of spacing betweenthe –35 and –10 hexamers found in E. colipromoters.The information displayed inthese two graphs applies to E. colipromoters that are recognized by RNApolymerase and the major s factor(designated s70).As we shall see in thenext chapter, bacteria also contain minors factors, each of which recognizes adifferent promoter sequence. Someparticularly strong promoters recognizedby RNA polymerase and s70 have anadditional sequence, located upstream (tothe left, in the figure) of the –35 hexamer,which is recognized by another subunit ofRNA polymerase.

Page 11: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

by comparing many sequences with the same basic function and tallying up themost common nucleotide found at each position. It therefore serves as a sum-mary or “average” of a large number of individual nucleotide sequences.

One reason that individual bacterial promoters differ in DNA sequence isthat the precise sequence determines the strength (or number of initiationevents per unit time) of the promoter. Evolutionary processes have thus fine-tuned each promoter to initiate as often as necessary and have created a widespectrum of promoters. Promoters for genes that code for abundant proteins aremuch stronger than those associated with genes that encode rare proteins, andtheir nucleotide sequences are responsible for these differences.

Like bacterial promoters, transcription terminators also include a widerange of sequences, with the potential to form a simple RNA structure being themost important common feature. Since an almost unlimited number ofnucleotide sequences have this potential, terminator sequences are much moreheterogeneous than those of promoters.

We have discussed bacterial promoters and terminators in some detail toillustrate an important point regarding the analysis of genome sequences.Although we know a great deal about bacterial promoters and terminators andcan develop consensus sequences that summarize their most salient features,their variation in nucleotide sequence makes it difficult for researchers (evenwhen aided by powerful computers) to definitively locate them simply byinspection of the nucleotide sequence of a genome. When we encounter analo-gous types of sequences in eucaryotes, the problem of locating them is evenmore difficult. Often, additional information, some of it from direct experimen-tation, is needed to accurately locate the short DNA signals contained ingenomes.

Promoter sequences are asymmetric (see Figure 6–12), and this feature hasimportant consequences for their arrangement in genomes. Since DNA is dou-ble-stranded, two different RNA molecules could in principle be transcribedfrom any gene, using each of the two DNA strands as a template. However a genetypically has only a single promoter, and because the nucleotide sequences ofbacterial (as well as eucaryotic) promoters are asymmetric the polymerase canbind in only one orientation. The polymerase thus has no option but to tran-scribe the one DNA strand, since it can synthesize RNA only in the 5¢ to 3¢ direc-tion (Figure 6–13). The choice of template strand for each gene is thereforedetermined by the location and orientation of the promoter. Genome sequencesreveal that the DNA strand used as the template for RNA synthesis varies fromgene to gene (Figure 6–14; see also Figure 1–31).

Having considered transcription in bacteria, we now turn to the situation ineucaryotes, where the synthesis of RNA molecules is a much more elaborateaffair.

Transcription Initiation in Eucaryotes Requires Many Proteins

In contrast to bacteria, which contain a single type of RNA polymerase, eucary-otic nuclei have three, called RNA polymerase I, RNA polymerase II, and RNA

FROM DNA TO RNA 309

Figure 6–13 The importance of RNA polymerase orientation. TheDNA strand serving as template must be traversed in a 3¢ to 5¢ direction, asillustrated in Figure 6–9.Thus, the direction of RNA polymerase movementdetermines which of the two DNA strands is to serve as a template for thesynthesis of RNA, as shown in (A) and (B). Polymerase direction is, in turn,determined by the orientation of the promoter sequence, the site at whichthe RNA polymerase begins transcription.

C C C C C C C C C C C C C C C C C C

G G G G G G G G G G G G G G G G G G

G G G G G G G

3¢5¢

5¢3¢

5¢3¢

DNA double helix

RNA

an RNA polymerase that moves from rightto left makes RNA by using the top strandas a template

C C C C C C C C C C C C C C C C C C

G G G G G G G G G G G G G G G G G G

3¢5¢

5¢3¢

C C C C C C C

3¢5¢

an RNA polymerase that moves from leftto right makes RNA by using the bottomstrand as a template

(A)

(B)

Figure 6–14 Directions oftranscription along a short portionof a bacterial chromosome. Somegenes are transcribed using one DNAstrand as a template, while others aretranscribed using the other DNA strand.The direction of transcription isdetermined by the promoter at thebeginning of each gene (green arrowheads).Approximately 0.2% (9000 base pairs) ofthe E. coli chromosome is depicted here.The genes transcribed from left to rightuse the bottom DNA strand as thetemplate; those transcribed from right toleft use the top strand as the template.5000 nucleotide pairs

5¢gene ggene fgene cgene b

gene a gene d gene e

RNA transcripts

DNA of E. coli chromosome

Page 12: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

polymerase III. The three polymerases are structurally similar to one another(and to the bacterial enzyme). They share some common subunits and manystructural features, but they transcribe different types of genes (Table 6–2). RNApolymerases I and III transcribe the genes encoding transfer RNA, ribosomalRNA, and various small RNAs. RNA polymerase II transcribes the vast majorityof genes, including all those that encode proteins, and our subsequent discus-sion therefore focuses on this enzyme.

Although eucaryotic RNA polymerase II has many structural similarities tobacterial RNA polymerase (Figure 6–15), there are several important differencesin the way in which the bacterial and eucaryotic enzymes function, two of whichconcern us immediately.

1. While bacterial RNA polymerase (with s factor as one of its subunits) isable to initiate transcription on a DNA template in vitro without the helpof additional proteins, eucaryotic RNA polymerases cannot. They requirethe help of a large set of proteins called general transcription factors, whichmust assemble at the promoter with the polymerase before the poly-merase can begin transcription.

2. Eucaryotic transcription initiation must deal with the packing of DNA intonucleosomes and higher order forms of chromatin structure, featuresabsent from bacterial chromosomes.

RNA Polymerase II Requires General Transcription Factors

The discovery that, unlike bacterial RNA polymerase, purified eucaryotic RNApolymerase II could not initiate transcription in vitro led to the discovery andpurification of the additional factors required for this process. These generaltranscription factors help to position the RNA polymerase correctly at the pro-moter, aid in pulling apart the two strands of DNA to allow transcription tobegin, and release RNA polymerase from the promoter into the elongation modeonce transcription has begun. The proteins are “general” because they assembleon all promoters used by RNA polymerase II; consisting of a set of interactingproteins, they are designated as TFII (for transcription factor for polymerase II),

310 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

TABLE 6–2 The Three RNA Polymerases in Eucaryotic Cells

TYPE OF POLYMERASE GENES TRANSCRIBED

RNA polymerase I 5.8S, 18S, and 28S rRNA genes

RNA polymerase II all protein-coding genes, plus snoRNA genesand some snRNA genes

RNA polymerase III tRNA genes, 5S rRNA genes, some snRNA genesand genes for other small RNAs

Figure 6–15 Structural similaritybetween a bacterial RNApolymerase and a eucaryotic RNApolymerase II. Regions of the two RNApolymerases that have similar structuresare indicated in green.The eucaryoticpolymerase is larger than the bacterialenzyme (12 subunits instead of 5), andsome of the additional regions are shownin gray. The blue spheres represent Znatoms that serve as structuralcomponents of the polymerases, and thered sphere represents the Mg atompresent at the active site, wherepolymerization takes place.The RNApolymerases in all modern-day cells(bacteria, archaea, and eucaryotes) areclosely related, indicating that the basicfeatures of the enzyme were in placebefore the divergence of the three majorbranches of life. (Courtesy of P. Cramerand R. Kornberg.)

Page 13: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

and listed as TFIIA, TFIIB, and so on. In a broad sense, the eucaryotic generaltranscription factors carry out functions equivalent to those of the s factor inbacteria.

Figure 6–16 shows how the general transcription factors assemble in vitro atpromoters used by RNA polymerase II. The assembly process starts with thebinding of the general transcription factor TFIID to a short double-helical DNAsequence primarily composed of T and A nucleotides. For this reason, thissequence is known as the TATA sequence, or TATA box, and the subunit of TFIIDthat recognizes it is called TBP (for TATA-binding protein). The TATA box is typi-cally located 25 nucleotides upstream from the transcription start site. It is notthe only DNA sequence that signals the start of transcription (Figure 6–17), but

FROM DNA TO RNA 311

Figure 6–16 Initiation of transcription of a eucaryotic gene by RNApolymerase II. To begin transcription, RNA polymerase requires a numberof general transcription factors (called TFIIA,TFIIB, and so on). (A) Thepromoter contains a DNA sequence called the TATA box, which is located25 nucleotides away from the site at which transcription is initiated. (B) TheTATA box is recognized and bound by transcription factor TFIID, which thenenables the adjacent binding of TFIIB (C). For simplicity the DNA distortionproduced by the binding of TFIID (see Figure 6–18) is not shown. (D) Therest of the general transcription factors, as well as the RNA polymeraseitself, assemble at the promoter. (E) TFIIH then uses ATP to pry apart theDNA double helix at the transcription start point, allowing transcription tobegin.TFIIH also phosphorylates RNA polymerase II, changing itsconformation so that the polymerase is released from the general factorsand can begin the elongation phase of transcription.As shown, the site ofphosphorylation is a long C-terminal polypeptide tail that extends from thepolymerase molecule.The assembly scheme shown in the figure wasdeduced from experiments performed in vitro, and the exact order in whichthe general transcription factors assemble on promoters in cells is notknown with certainty. In some cases, the general factors are thought to firstassemble with the polymerase, with the whole assembly subsequentlybinding to the DNA in a single step.The general transcription factors havebeen highly conserved in evolution; some of those from human cells can bereplaced in biochemical experiments by the corresponding factors fromsimple yeasts.

TATA box

TFIIDTBP

start of transcription

TFIIB

TFIIA

TFIIE

TFIIF

RNA polymerase II

TRANSCRIPTION

P P P P

TFIIH

other factors

(A)

(B)

(C)

(D)

(E)

UTP, ATPCTP, GTP

RNA

Figure 6–17 Consensus sequences found in the vicinity of eucaryotic RNApolymerase II start points. The name given to each consensus sequence (firstcolumn) and the general transcription factor that recognizes it (last column) areindicated. N indicates any nucleotide, and two nucleotides separated by a slashindicate an equal probability of either nucleotide at the indicated position. In reality,each consensus sequence is a shorthand representation of a histogram similar to thatof Figure 6–12. For most RNA polymerase II transcription start points, only two orthree of the four sequences are present. For example, most polymerase II promotershave a TATA box sequence, and those that do not typically have a “strong” INRsequence. Although most of the DNA sequences that influence transcription initiationare located “upstream” of the transcription start point, a few, such as the DPE shownin the figure, are located in the transcribed region.

G/C G/C G/A C G C C TFIIB BRE

T A T A A/T A A/T TBP TATA

C/T C/T A N T/A C/T C/T TFIID INR

A/G G A/T C G T G TFIID DPE

element consensussequence

generaltranscription

factor

BRE TATA INR DPE

–35 –30 +30

transcriptionstart point

Page 14: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

for most polymerase II promoters, it is the most important. The binding of TFIIDcauses a large distortion in the DNA of the TATA box (Figure 6–18). This distor-tion is thought to serve as a physical landmark for the location of an active pro-moter in the midst of a very large genome, and it brings DNA sequences on bothsides of the distortion together to allow for subsequent protein assembly steps.Other factors are then assembled, along with RNA polymerase II, to form a com-plete transcription initiation complex (see Figure 6–16).

After RNA polymerase II has been guided onto the promoter DNA to form atranscription initiation complex, it must gain access to the template strand atthe transcription start point. This step is aided by one of the general transcrip-tion factors, TFIIH, which contains a DNA helicase. Next, like the bacterial poly-merase, polymerase II remains at the promoter, synthesizing short lengths ofRNA until it undergoes a conformational change and is released to begin tran-scribing a gene. A key step in this release is the addition of phosphate groups tothe “tail” of the RNA polymerase (known as the CTD or C-terminal domain). Thisphosphorylation is also catalyzed by TFIIH, which, in addition to a helicase, con-tains a protein kinase as one of its subunits (see Figure 6–16, D and E). The poly-merase can then disengage from the cluster of general transcription factors,undergoing a series of conformational changes that tighten its interaction withDNA and acquiring new proteins that allow it to transcribe for long distanceswithout dissociating.

Once the polymerase II has begun elongating the RNA transcript, most ofthe general transcription factors are released from the DNA so that they areavailable to initiate another round of transcription with a new RNA polymerasemolecule. As we see shortly, the phosphorylation of the tail of RNA polymeraseII also causes components of the RNA processing machinery to load onto thepolymerase and thus be in position to modify the newly transcribed RNA as itemerges from the polymerase.

Polymerase II Also Requires Activator, Mediator, andChromatin-modifying Proteins

The model for transcription initiation just described was established by study-ing the action of RNA polymerase II and its general transcription factors onpurified DNA templates in vitro. However, as discussed in Chapter 4, DNA ineucaryotic cells is packaged into nucleosomes, which are further arranged inhigher-order chromatin structures. As a result, transcription initiation in aeucaryotic cell is more complex and requires more proteins than it does on puri-fied DNA. First, gene regulatory proteins known as transcriptional activators

312 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–18 Three-dimensionalstructure of TBP (TATA-bindingprotein) bound to DNA. The TBP isthe subunit of the general transcriptionfactor TFIID that is responsible forrecognizing and binding to the TATA boxsequence in the DNA (red). The uniqueDNA bending caused by TBP—two kinksin the double helix separated by partlyunwound DNA—may serve as a landmarkthat helps to attract the other generaltranscription factors.TBP is a singlepolypeptide chain that is folded into twovery similar domains (blue and green).(Adapted from J.L. Kim et al., Nature365:520–527, 1993.)

N

C

G

A

A

AA

A

T

T

Page 15: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

bind to specific sequences in DNA and help to attract RNA polymerase II to thestart point of transcription (Figure 6–19). This attraction is needed to help theRNA polymerase and the general transcription factors in overcoming the diffi-culty of binding to DNA that is packaged in chromatin. We discuss the role ofactivators in Chapter 7, because they represent one of the main ways in whichcells regulate expression of their genes. Here we simply note that their presenceon DNA is required for transcription initiation in a eucaryotic cell. Second,eucaryotic transcription initiation in vivo requires the presence of a proteincomplex known as the mediator, which allows the activator proteins to com-municate properly with the polymerase II and with the general transcriptionfactors. Finally, transcription initiation in the cell often requires the localrecruitment of chromatin-modifying enzymes, including chromatin remodel-ing complexes and histone acetylases (see Figure 6–19). As discussed in Chapter4, both types of enzymes can allow greater accessibility to the DNA present inchromatin, and by doing so, they facilitate the assembly of the transcription ini-tiation machinery onto DNA.

As illustrated in Figure 6–19, many proteins (well over one hundred indi-vidual subunits) must assemble at the start point of transcription to initiatetranscription in a eucaryotic cell. The order of assembly of these proteins isprobably different for different genes and therefore may not follow a prescribedpathway. In fact, some of these different protein assemblies may interact witheach other away from the DNA and be brought to DNA as preformed subcom-plexes. For example, the mediator, RNA polymerase II, and some of the generaltranscription factors can bind to each other in the nucleoplasm and be broughtto the DNA as a unit. We return to this issue in Chapter 7, where we discuss themany ways eucaryotic cells can regulate the process of transcription initiation.

Transcription Elongation Produces Superhelical Tension in DNA

Once it has initiated transcription, RNA polymerase does not proceed smoothlyalong a DNA molecule; rather it moves jerkily, pausing at some sequences andrapidly transcribing through others. Elongating RNA polymerases, both bacterialand eucaryotic, are associated with a series of elongation factors, proteins thatdecrease the likelihood that RNA polymerase will dissociate before it reaches theend of a gene. These factors typically associate with RNA polymerase shortlyafter initiation has occurred and help polymerases to move through the wide

FROM DNA TO RNA 313

TRANSCRIPTION BEGINS

activator protein

enhancer(binding site for

activator protein)BINDING OFGENERAL TRANSCRIPTIONFACTORS, RNA POLYMERASE,MEDIATOR, CHROMATIN REMODELINGCOMPLEXES, AND HISTONE ACETYLASES

TATA boxstart of

transcription

mediator

chromatinremodelingcomplex

histone acetylase

Figure 6–19 Transcription initiationby RNA polymerase II in aeucaryotic cell. Transcription initiation invivo requires the presence oftranscriptional activator proteins.Asdescribed in Chapter 7, these proteinsbind to specific short sequences in DNA.Although only one is shown here, a typicaleucaryotic gene has many activatorproteins, which together determine itsrate and pattern of transcription.Sometimes acting from a distance ofseveral thousand nucleotide pairs(indicated by the dashed DNA molecule),these gene regulatory proteins help RNApolymerase, the general factors, and themediator all to assemble at the promoter.In addition, activators attract ATP-dependent chromatin-remodelingcomplexes and histone acetylases.

As discussed in Chapter 4, the “default”state of chromatin is probably the 30-nmfilament, and this is likely to be a form ofDNA upon which transcription is initiated.For simplicity, it is not shown in the figure.

Page 16: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

variety of different DNA sequences that are found in genes. Eucaryotic RNApolymerases must also contend with chromatin structure as they move along aDNA template. Experiments have shown that bacterial polymerases, whichnever encounter nucleosomes in vivo, can nonetheless transcribe through themin vitro, suggesting that a nucleosome is easily traversed. However, eucaryoticpolymerases have to move through forms of chromatin that are more compactthan a simple nucleosome. It therefore seems likely that they transcribe with theaid of chromatin remodeling complexes (see pp. 212–213). These complexesmay move with the polymerase or may simply seek out and rescue the occa-sional stalled polymerase. In addition, some elongation factors associated witheucaryotic RNA polymerase facilitate transcription through nucleosomes with-out requiring additional energy. It is not yet understood how this is accom-plished, but these proteins may help to dislodge parts of the nucleosome core asthe polymerase transcribes the DNA of a nucleosome.

There is yet another barrier to elongating polymerases, both bacterial andeucaryotic. To discuss this issue, we need first to consider a subtle propertyinherent in the DNA double helix called DNA supercoiling. DNA supercoilingrepresents a conformation that DNA will adopt in response to superhelical ten-sion; conversely, creating various loops or coils in the helix can create such ten-sion. A simple way of visualizing the topological constraints that cause DNAsupercoiling is illustrated in Figure 6–20A. There are approximately 10nucleotide pairs for every helical turn in a DNA double helix. Imagine a helixwhose two ends are fixed with respect to each other (as they are in a DNA circle,such as a bacterial chromosome, or in a tightly clamped loop, as is thought toexist in eucaryotic chromosomes). In this case, one large DNA supercoil willform to compensate for each 10 nucleotide pairs that are opened (unwound).The formation of this supercoil is energetically favorable because it restores anormal helical twist to the base-paired regions that remain, which would other-wise need to be overwound because of the fixed ends.

Superhelical tension is also created as RNA polymerase moves along astretch of DNA that is anchored at its ends (Figure 6–20C). As long as the poly-merase is not free to rotate rapidly (and such rotation is unlikely given the size

314 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–20 Superhelical tension inDNA causes DNA supercoiling.(A) For a DNA molecule with one freeend (or a nick in one strand that serves asa swivel), the DNA double helix rotates byone turn for every 10 nucleotide pairsopened. (B) If rotation is prevented,superhelical tension is introduced into theDNA by helix opening. One way ofaccommodating this tension would be toincrease the helical twist from 10 to 11nucleotide pairs per turn in the doublehelix that remains in this example; theDNA helix, however, resists such adeformation in a springlike fashion,preferring to relieve the superhelicaltension by bending into supercoiled loops.As a result, one DNA supercoil forms inthe DNA double helix for every 10 nucleotide pairs opened.The supercoilformed in this case is a positive supercoil.(C) Supercoiling of DNA is induced by aprotein tracking through the DNA doublehelix.The two ends of the DNA shownhere are unable to rotate freely relative toeach other, and the protein molecule isassumed also to be prevented fromrotating freely as it moves. Under theseconditions, the movement of the proteincauses an excess of helical turns toaccumulate in the DNA helix ahead of theprotein and a deficit of helical turns toarise in the DNA behind the protein,as shown.

unwind 10 DNA base pairs(one helical turn)

DNA with free end

DNA helix must rotate one turn

(A) (B)

(C)

unwind 10 DNA base pairs(one helical turn)

DNA with fixed ends

DNA helix forms one supercoil

DNAprotein molecule

NEGATIVE SUPERCOILINGhelix opening facilitated

POSITIVE SUPERCOILINGhelix opening hindered

Page 17: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

of RNA polymerases and their attached transcripts), a moving polymerase gen-erates positive superhelical tension in the DNA in front of it and negative helicaltension behind it. For eucaryotes, this situation is thought to provide a bonus:the positive superhelical tension ahead of the polymerase makes the DNA helixmore difficult to open, but this tension should facilitate the unwrapping of DNAin nucleosomes, as the release of DNA from the histone core helps to relax pos-itive superhelical tension.

Any protein that propels itself alone along a DNA strand of a double helixtends to generate superhelical tension. In eucaryotes, DNA topoisomeraseenzymes rapidly remove this superhelical tension (see p. 251). But, in bacteria, aspecialized topoisomerase called DNA gyrase uses the energy of ATP hydrolysisto pump supercoils continuously into the DNA, thereby maintaining the DNAunder constant tension. These are negative supercoils, having the oppositehandedness from the positive supercoils that form when a region of DNA helixopens (see Figure 6–20B). These negative supercoils are removed from bacterialDNA whenever a region of helix opens, reducing the superhelical tension. DNAgyrase therefore makes the opening of the DNA helix in bacteria energeticallyfavorable compared with helix opening in DNA that is not supercoiled. For thisreason, it usually facilitates those genetic processes in bacteria, including the ini-tiation of transcription by bacterial RNA polymerase, that require helix opening(see Figure 6–10).

Transcription Elongation in Eucaryotes Is Tightly Coupled To RNA Processing

We have seen that bacterial mRNAs are synthesized solely by the RNA poly-merase starting and stopping at specific spots on the genome. The situation ineucaryotes is substantially different. In particular, transcription is only the firststep in a series of reactions that includes the covalent modification of both endsof the RNA and the removal of intron sequences that are discarded from themiddle of the RNA transcript by the process of RNA splicing (Figure 6–21). Themodifications of the ends of eucaryotic mRNA are capping on the 5¢ end andpolyadenylation of the 3¢ end (Figure 6–22). These special ends allow the cell toassess whether both ends of an mRNA molecule are present (and the messageis therefore intact) before it exports the RNA sequence from the nucleus for

FROM DNA TO RNA 315

AAAA

AAAA

cytoplasm

nucleus

“primary RNA transcript”

DNAintrons exons

transcription unitTRANSCRIPTION

TRANSLATION

EXPORT

5¢ CAPPINGRNA SPLICING3¢ POLYADENYLATION

mRNA

mRNA

protein

EUCARYOTES(A) PROCARYOTES(B)

DNA

TRANSCRIPTION

TRANSLATION

mRNA

protein

RNA cap

Figure 6–21 Summary of the stepsleading from gene to protein ineucaryotes and bacteria. The final levelof a protein in the cell depends on theefficiency of each step and on the rates ofdegradation of the RNA and proteinmolecules. (A) In eucaryotic cells the RNAmolecule produced by transcription alone(sometimes referred to as the primarytranscript) would contain both coding(exon) and noncoding (intron) sequences.Before it can be translated into protein,the two ends of the RNA are modified,the introns are removed by anenzymatically catalyzed RNA splicingreaction, and the resulting mRNA istransported from the nucleus to thecytoplasm.Although these steps aredepicted as occurring one at a time, in asequence, in reality they are coupled anddifferent steps can occur simultaneously.For example, the RNA cap is added andsplicing typically begins beforetranscription has been completed. Becauseof this coupling, complete primary RNAtranscripts do not typically exist in thecell. (B) In procaryotes the production ofmRNA molecules is much simpler.The 5¢ end of an mRNA molecule is producedby the initiation of transcription by RNApolymerase, and the 3¢ end is produced bythe termination of transcription. Sinceprocaryotic cells lack a nucleus,transcription and translation take place ina common compartment. In fact,translation of a bacterial mRNA oftenbegins before its synthesis has beencompleted.

Page 18: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

translation into protein. In Chapter 4, we saw that a typical eucaryotic gene ispresent in the genome as short blocks of protein-coding sequence (exons) sep-arated by long introns, and RNA splicing is the critically important step in whichthe different portions of a protein coding sequence are joined together. As wedescribe next, RNA splicing also provides higher eucaryotes with the ability tosynthesize several different proteins from the same gene.

These RNA processing steps are tightly coupled to transcription elongationby an ingenious mechanism. As discussed previously, a key step of the transitionof RNA polymerase II to the elongation mode of RNA synthesis is an extensivephosphorylation of the RNA polymerase II tail, called the CTD. This C-terminaldomain of the largest subunit consists of a long tandem array of a repeatedseven-amino-acid sequence, containing two serines per repeat that can bephosphorylated. Because there are 52 repeats in the CTD of human RNA poly-merase II, its complete phosphorylation would add 104 negatively chargedphosphate groups to the polymerase. This phosphorylation step not only disso-ciates the RNA polymerase II from other proteins present at the start point oftranscription, it also allows a new set of proteins to associate with the RNA poly-merase tail that function in transcription elongation and pre-mRNA processing.As discussed next, some of these processing proteins seem to “hop” from thepolymerase tail onto the nascent RNA molecule to begin processing it as itemerges from the RNA polymerase. Thus, RNA polymerase II in its elongationmode can be viewed as an RNA factory that both transcribes DNA into RNA andprocesses the RNA it produces (Figure 6–23).

RNA Capping Is the First Modification of Eucaryotic Pre-mRNAs

As soon as RNA polymerase II has produced about 25 nucleotides of RNA, the 5¢end of the new RNA molecule is modified by addition of a “cap” that consists of

316 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

PPP

PPPG

CH3

3¢5¢+

AAAAA150–250

codingsequence

noncodingsequence

codingsequence

noncodingsequence

protein

5¢ cap

(A)

(B)

eucaryotic mRNA

procaryotic mRNA

5¢ 3¢

protein α protein β protein γ

OH

P

CH2

OH

P

CH2

OH

CH2 PPP CH25¢

CH3

7-methylguanosine5¢ end of

primary transcript

5¢-to-5¢triphosphate

bridge

HO OH

N +

Figure 6–22 A comparison of the structures of procaryotic andeucaryotic mRNA molecules. (A) The 5¢ and 3¢ ends of a bacterialmRNA are the unmodified ends of the chain synthesized by the RNApolymerase, which initiates and terminates transcription at those points,respectively.The corresponding ends of a eucaryotic mRNA are formed byadding a 5¢ cap and by cleavage of the pre-mRNA transcript and the additionof a poly-A tail, respectively.The figure also illustrates another differencebetween the procaryotic and eucaryotic mRNAs: bacterial mRNAs cancontain the instructions for several different proteins, whereas eucaryoticmRNAs nearly always contain the information for only a single protein.(B) The structure of the cap at the 5¢ end of eucaryotic mRNA molecules.Note the unusual 5¢-to-5¢ linkage of the 7-methyl G to the remainder of theRNA. Many eucaryotic mRNAs carry an additional modification: the 2¢-hydroxyl group on the second ribose sugar in the mRNA is methylated(not shown).

Page 19: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

a modified guanine nucleotide (see Figure 6–22B). The capping reaction is per-formed by three enzymes acting in succession: one (a phosphatase) removesone phosphate from the 5¢ end of the nascent RNA, another (a guanyl trans-ferase) adds a GMP in a reverse linkage (5¢ to 5¢ instead of 5¢ to 3¢), and a third (amethyl transferase) adds a methyl group to the guanosine (Figure 6–24).Because all three enzymes bind to the phosphorylated RNA polymerase tail,they are poised to modify the 5¢ end of the nascent transcript as soon as itemerges from the polymerase.

The 5¢-methyl cap signals the 5¢ end of eucaryotic mRNAs, and this land-mark helps the cell to distinguish mRNAs from the other types of RNA moleculespresent in the cell. For example, RNA polymerases I and III produce uncappedRNAs during transcription, in part because these polymerases lack tails. In thenucleus, the cap binds a protein complex called CBC (cap-binding complex),which, as we discuss in subsequent sections, helps the RNA to be properly pro-cessed and exported. The 5¢ methyl cap also has an important role in the trans-lation of mRNAs in the cytosol as we discuss later in the chapter.

RNA Splicing Removes Intron Sequences from NewlyTranscribed Pre-mRNAs

As discussed in Chapter 4, the protein coding sequences of eucaryotic genes aretypically interrupted by noncoding intervening sequences (introns). Discoveredin 1977, this feature of eucaryotic genes came as a surprise to scientists, who hadbeen, until that time, familiar only with bacterial genes, which typically consistof a continuous stretch of coding DNA that is directly transcribed into mRNA. Inmarked contrast, eucaryotic genes were found to be broken up into small piecesof coding sequence (expressed sequences or exons) interspersed with muchlonger intervening sequences or introns; thus the coding portion of a eucaryoticgene is often only a small fraction of the length of the gene (Figure 6–25).

Both intron and exon sequences are transcribed into RNA. The intronsequences are removed from the newly synthesized RNA through the process ofRNA splicing. The vast majority of RNA splicing that takes place in cells func-tions in the production of mRNA, and our discussion of splicing focuses on thistype. It is termed precursor-mRNA (or pre-mRNA) splicing to denote that itoccurs on RNA molecules destined to become mRNAs. Only after 5¢ and 3¢ endprocessing and splicing have taken place is such RNA termed mRNA.

Each splicing event removes one intron, proceeding through two sequentialphosphoryl-transfer reactions known as transesterifications; these join twoexons while removing the intron as a “lariat” (Figure 6–26). Since the number ofphosphate bonds remains the same, these reactions could in principle takeplace without nucleoside triphosphate hydrolysis. However, the machinery thatcatalyzes pre-mRNA splicing is complex, consisting of 5 additional RNAmolecules and over 50 proteins, and it hydrolyzes many ATP molecules persplicing event. This complexity is presumably needed to ensure that splicing ishighly accurate, while also being sufficiently flexible to deal with the enormousvariety of introns found in a typical eucaryotic cell. Frequent mistakes in RNA

FROM DNA TO RNA 317

Figure 6–23 The “RNA factory” concept for eucaryotic RNApolymerase II. Not only does the polymerase transcribe DNA into RNA,but it also carries pre-mRNA-processing proteins on its tail, which are thentransferred to the nascent RNA at the appropriate time.There are manyRNA-processing enzymes, and not all travel with the polymerase. For RNAsplicing, for example, only a few critical components are carried on the tail;once transferred to an RNA molecule, they serve as a nucleation site for theremaining components.The RNA-processing proteins first bind to the RNApolymerase tail when it is phosphorylated late in the process oftranscription initiation (see Figure 6–16). Once RNA polymerase II finishestranscribing, it is released from DNA, the phosphates on its tail are removedby soluble phosphatases, and it can reinitiate transcription. Only thisdephosphorylated form of RNA polymerase II is competent to start RNAsynthesis at a promoter.

P P P P

P P P P

RNA

RNA polymerase

capping factors

splicingfactors

polyadenylationfactors

Figure 6–24 The reactions that capthe 5¢¢ end of each RNA moleculesynthesized by RNA polymerase II.The final cap contains a novel 5¢-to-5¢

linkage between the positively charged 7-methyl G residue and the 5¢ end of theRNA transcript (see Figure 6–22B).Theletter N represents any one of the fourribonucleotides, although the nucleotidethat starts an RNA chain is usually apurine (an A or a G). (After A.J. Shatkin,BioEssays 7:275–277, 1987. © ICSU Press.)

pppNpNp

ppNpNp

GpppNpNp

CH3 GpppNpNp

GTP

Pi

5¢ end of nascent RNA transcript

5¢ 3¢

PPi

add methylgroup to base

+

CH3 GpppNpNp+

CH3

add methylgroup to ribose(only on some caps)

Page 20: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

splicing would severely harm the cell, as they would result in malfunctioningproteins. We see in Chapter 7 that when rare splicing mistakes do occur, the cellhas a “fail-safe” device to eliminate the incorrectly spliced mRNAs.

It may seem wasteful to remove large numbers of introns by RNA splicing. Inattempting to explain why it occurs, scientists have pointed out that theexon–intron arrangement would seem to facilitate the emergence of new anduseful proteins. Thus, the presence of numerous introns in DNA allows geneticrecombination to readily combine the exons of different genes (see p. 462),allowing genes for new proteins to evolve more easily by the combination ofparts of preexisting genes. This idea is supported by the observation, describedin Chapter 3, that many proteins in present-day cells resemble patchworks com-posed from a common set of protein pieces, called protein domains.

RNA splicing also has a present-day advantage. The transcripts of manyeucaryotic genes (estimated at 60% of genes in humans) are spliced in a varietyof different ways to produce a set of different mRNAs, thereby allowing a corre-sponding set of different proteins to be produced from the same gene (Figure6–27). We discuss additional examples of alternative splicing in Chapter 7, as thisis also one of the mechanisms that cells use to change expression of their genes.Rather than being the wasteful process it may have seemed at first sight, RNAsplicing enables eucaryotes to increase the already enormous coding potentialof their genomes. We shall return to this idea several times in this chapter andthe next, but we first need to describe the cellular machinery that performs thisremarkable task.

318 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–26 The RNA splicingreaction. (A) In the first step, a specificadenine nucleotide in the intron sequence(indicated in red) attacks the 5¢ splice siteand cuts the sugar-phosphate backbone ofthe RNA at this point.The cut 5¢ end ofthe intron becomes covalently linked tothe adenine nucleotide, as shown in detailin (B), thereby creating a loop in the RNAmolecule.The released free 3¢-OH end ofthe exon sequence then reacts with thestart of the next exon sequence, joiningthe two exons together and releasing theintron sequence in the shape of a lariat.The two exon sequences thereby becomejoined into a continuous coding sequence;the released intron sequence is degradedin due course.

O

O A

PO

OH

O

O

O_

O

OHO

O

PO

O

O_

O

PO

O

O_

O

OH

OP

O

O_

O

PO

O

O_

O

OH

O

O

PO O_

G

U

3¢ 2¢

5¢ end of intronsequence

3¢ end of intronsequence

excised intronsequence in formof a lariat

A

OH

AOH

AHO

5¢ 3¢

+

lariat

5¢ exonsequence

intronsequence 3¢ exon

sequence2¢

(A) (B)

human β-globin gene

1 2 3

2000nucleotide pairs

human Factor VIII gene

exons

200,000 nucleotide pairs

1 5 10 14 22 25 26

(A) (B)

Figure 6–25 Structure of two human genes showing the arrangement of exons and introns.(A) The relatively small b-globin gene, which encodes one of the subunits of the oxygen-carrying proteinhemoglobin, contains 3 exons (see also Figure 4–7). (B) The much larger Factor VIII gene contains 26 exons; itcodes for a protein (Factor VIII) that functions in the blood-clotting pathway. Mutations in this gene areresponsible for the most prevalent form of hemophilia.

Page 21: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Nucleotide Sequences Signal Where Splicing Occurs

Introns range in size from about 10 nucleotides to over 100,000 nucleotides.Picking out the precise borders of an intron is very difficult for scientists to do(even with the aid of computers) when confronted by a complete genomesequence of a eucaryote. The possibility of alternative splicing compounds theproblem of predicting protein sequences solely from a genome sequence. Thisdifficulty constitutes one of the main barriers to identifying all of the genes in acomplete genome sequence, and it is the primary reason that we know only theapproximate number of genes in, for example, the human genome. Yet each cellin our body recognizes and rapidly excises the appropriate intron sequenceswith high fidelity. We have seen that intron sequence removal involves threepositions on the RNA: the 5¢ splice site, the 3¢ splice site, and the branch point inthe intron sequence that forms the base of the excised lariat. In pre-mRNA splic-ing, each of these three sites has a consensus nucleotide sequence that is simi-lar from intron to intron, providing the cell with cues on where splicing is to takeplace (Figure 6–28). However, there is enough variation in each sequence tomake it very difficult for scientists to pick out all of the many splicing signals ina genome sequence.

RNA Splicing Is Performed by the Spliceosome

Unlike the other steps of mRNA production we have discussed, RNA splicing isperformed largely by RNA molecules instead of proteins. RNA molecules recog-nize intron–exon borders and participate in the chemistry of splicing. TheseRNA molecules are relatively short (less than 200 nucleotides each), and thereare five of them (U1, U2, U4, U5, and U6) involved in the major form of pre-mRNA splicing. Known as snRNAs (small nuclear RNAs), each is complexedwith at least seven protein subunits to form a snRNP (small nuclear ribonucleo-protein). These snRNPs form the core of the spliceosome, the large assembly ofRNA and protein molecules that performs pre-mRNA splicing in the cell.

FROM DNA TO RNA 319

Figure 6–27 Alternative splicing ofthe aa-tropomyosin gene from rat.a-tropomyosin is a coiled-coil protein (seeFigure 3–11) that regulates contraction inmuscle cells.The primary transcript can bespliced in different ways, as indicated inthe figure, to produce distinct mRNAs,which then give rise to variant proteins.Some of the splicing patterns are specificfor certain types of cells. For example, thea-tropomyosin made in striated muscle isdifferent from that made from the samegene in smooth muscle.The arrowheads in the top part of the figure demark thesites where cleavage and poly-A additioncan occur.

5¢ 3¢

5¢ 3¢

5¢ 3¢

5¢ 3¢

smooth muscle mRNA

fibroblast mRNA

fibroblast mRNA

5¢ 3¢ brain mRNA

striated muscle mRNA

5¢3¢

3¢5¢

DNA

α-tropomyosin gene

exons introns

TRANSCRIPTION, SPLICING, AND3¢ CLEAVAGE/POLYADENYLATION

Figure 6–28 The consensusnucleotide sequences in an RNAmolecule that signal the beginningand the end of most introns inhumans. Only the three blocks ofnucleotide sequences shown are requiredto remove an intron sequence; the rest ofthe intron can be occupied by anynucleotide. Here A, G, U, and C are thestandard RNA nucleotides; R stands foreither A or G;Y stands for either C or U.The A highlighted in red forms the branchpoint of the lariat produced by splicing.Only the GU at the start of the intronand the AG at its end are invariantnucleotides in the splicing consensussequences.The remaining positions (eventhe branch point A) can be occupied by avariety of nucleotides, although theindicated nucleotides are preferred.Thedistances along the RNA between thethree splicing consensus sequences arehighly variable; however, the distancebetween the branch point and 3¢ splicejunction is typically much shorter thanthat between the 5¢ splice junction andthe branch point.

– – – AG GURAGU – – – – CTRAYY – .... – YYYYYYYYNCAG G – – –

– – – AG G – – –

sequences required for intron removal

INTRON REMOVED

5¢ 3¢

3¢5¢

portion of aprimary transcript

exon 1

exon 1 exon 2

exon 2intron

portion ofmRNA

Page 22: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

The spliceosome is a dynamic machine; as we see below, it is assembled onpre-mRNA from separate components, and parts enter and leave it as the splic-ing reaction proceeds (Figure 6–29). During the splicing reaction, recognition ofthe 5¢ splice junction, the branch point site and the 3¢ splice junction is per-formed largely through base-pairing between the snRNAs and the consensusRNA sequences in the pre-mRNA substrate (Figure 6–30). In the course of splic-ing, the spliceosome undergoes several shifts in which one set of base-pair inter-actions is broken and another is formed in its place. For example, U1 is replacedby U6 at the 5¢ splice junction (see Figure 6–30A). As we shall see, this type ofRNA–RNA rearrangement (in which the formation of one RNA–RNA interactionrequires the disruption of another) occurs several times during the splicing reac-tion. It permits the checking and rechecking of RNA sequences before the chem-ical reaction is allowed to proceed, thereby increasing the accuracy of splicing.

320 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–29 The RNA splicingmechanism. RNA splicing is catalyzed byan assembly of snRNPs (shown as coloredcircles) plus other proteins (most of whichare not shown), which together constitutethe spliceosome.The spliceosomerecognizes the splicing signals on a pre-mRNA molecule, brings the two endsof the intron together, and provides theenzymatic activity for the two reactionsteps (see Figure 6–26).The branch-pointsite is first recognized by the BBP (branch-point binding protein) and U2AF, a helperprotein. In the next steps, the U2 snRNPdisplaces BBP and U2AF and forms basepairs with the branch-point site consensussequence, and the U1 snRNP forms base-pairs with the 5¢ splice junction (seeFigure 6–30). At this point, the U4/U6∑U5“triple” snRNP enters the spliceosome. Inthis triple snRNP, the U4 and U6 snRNAsare held firmly together by base-pairinteractions and the U5 snRNP is moreloosely associated. Several RNA–RNArearrangements then occur that breakapart the U4/U6 base pairs (as shown, theU4 snRNP is ejected from the splicesomebefore splicing is complete) and allow theU6 snRNP to displace U1 at the 5¢ splicejunction (see Figure 6–30). Subsequentrearrangements create the active site ofthe spliceosome and position theappropriate portions of the pre-mRNAsubstrate for the splicing reaction tooccur.Although not shown in the figure,each splicing event requires additionalproteins, some of which hydrolyze ATPand promote the RNA–RNArearrangements.

U1 snRNP

intron

U2 snRNP

5¢ 3¢

A

OH3¢

excised intron sequencein the form of a lariat(intron RNA will be degradedin the nucleus; snRNPs willbe recycled)

5¢ 3¢portion of mRNA

3¢ SPLICE SITECLEAVAGE ANDJOINING OF TWOEXON SEQUENCES

5¢ 3¢3¢

AOH

LARIAT FORMATIONAND 5' SPLICE SITECLEAVAGE

A

A

U4/U6 •U5 “triple”snRNP

U4/U6 snRNP

U6 snRNP

U5 snRNP

lariat

exon 1 exon 2

5¢ splice site

exon 1

BBPU2AF

intron5¢

3¢ splice site

BBP U2AFexon 2

portion of a pre-mRNA transcript

A

U1 snRNPU2 snRNP

U1, U4

Page 23: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

The Spliceosome Uses ATP Hydrolysis to Produce a Complex Series of RNA–RNA Rearrangements

Although ATP hydrolysis is not required for the chemistry of RNA splicing per se,it is required for the stepwise assembly and rearrangements of the spliceosome.Some of the additional proteins that make up the spliceosome are RNA helicases,which use the energy of ATP hydrolysis to break existing RNA–RNA interactionsso as to allow the formation of new ones. In fact, all the steps shown previouslyin Figure 6–29—except the association of BBP with the branch-point site and U1snRNP with the 5¢ splice site—require ATP hydrolysis and additional proteins. Inall, more than 50 proteins, including those that form the snRNPs, are requiredfor each splicing event.

The ATP-requiring RNA–RNA rearrangements that take place in the spliceo-some occur within the snRNPs themselves and between the snRNPs and the pre-mRNA substrate. One of the most important roles of these rearrangements is thecreation of the active catalytic site of the spliceosome. The strategy of creating anactive site only after the assembly and rearrangement of splicing componentson a pre-mRNA substrate is an important way of preventing wayward splicing.

Perhaps the most surprising feature of the spliceosome is the nature of thecatalytic site itself: it is largely (if not exclusively) formed by RNA moleculesinstead of proteins. In the last section of this chapter we discuss in general termsthe structural and chemical properties of RNA that allow it to perform catalysis;here we need only consider that the U2 and U6 snRNAs in the spliceosome forma precise three-dimensional RNA structure that juxtaposes the 5¢ splice site ofthe pre-mRNA with the branch-point site and probably performs the first trans-esterification reaction (see Figure 6–30C). In a similar way, the 5¢ and 3¢ splicejunctions are brought together (an event requiring the U5 snRNA) to facilitatethe second transesterification.

FROM DNA TO RNA 321

C

G

A

U

U

A

U

U

C

G

A

U

U1

exon 1

5¢ GU A UG

G A G A C A

U

U6

exon 1

5¢ 3¢3¢rearrangement

(A)

U A C U A A C

exon 2

U2

3¢ U A C U AAC

exon 2

5¢ 3¢

A UG A UGrearrangement

rearrangement

(B)

(C)

BBP

AA

C

C

U AA

G

G

GA

A

A

A

UGGG

G

G

U

U

U

A

AA

U C

C

CUU

G

G A UGU

C U AAC A

U

A

A

U

exon 1

exon 2

U6

U2 AA

C

C

U AA

G

G

GA

A

A

A

UGGG

G

G

U

U

U

A

AA

U C

C

CUU

G

G A UGU

C U A C A

U

A

A

U

exon 1

exon 2

U6

U2

AU

UU5

CC

Figure 6–30 Several of therearrangements that take place inthe spliceosome during pre-mRNAsplicing. Shown here are the details forthe yeast Saccharomyces cerevisiae, in whichthe nucleotide sequences involved areslightly different from those in humancells. (A) The exchange of U1 snRNP forU6 snRNP occurs before the firstphosphoryl-transfer reaction (see Figure6–29).This exchange allows the 5¢ splicesite to be read by two different snRNPs,thereby increasing the accuracy of 5¢

splice site selection by the spliceosome.(B) The branch-point site is firstrecognized by BBP and subsequently byU2 snRNP; as in (A), this “check andrecheck” strategy provides increasedaccuracy of site selection.The binding ofU2 to the branch-point forces theappropriate adenine (in red) to beunpaired and thereby activates it for theattack on the 5¢ splice site (see Figure6–29).This, in combination withrecognition by BBP, is the way in which thespliceosome accurately chooses theadenine that is ultimately to form thebranch point. (C) After the firstphosphoryl-transfer reaction (left) hasoccurred, U5 snRNP undergoes arearrangement that brings the two exonsinto close proximity for the secondphosphoryl-transfer reaction (right). ThesnRNAs both position the reactants andprovide (either all or in part) the catalyticsite for the two reactions.The U5 snRNPis present in the spliceosome before thisrearrangement occurs; for clarity it hasbeen omitted from the left panel.Asdiscussed in the text, all of the RNA–RNArearrangements shown in this figure (aswell as others that occur in thespliceosome but are not shown) requirethe participation of additional proteins andATP hydrolysis.

Page 24: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Once the splicing chemistry is completed, the snRNPs remain bound to thelariat and the spliced product is released. The disassembly of these snRNPs fromthe lariat (and from each other) requires another series of RNA–RNA rearrange-ments that require ATP hydrolysis, thereby returning the snRNAs to their origi-nal configuration so that they can be used again in a new reaction.

Ordering Influences in the Pre-mRNA Help to Explain How the Proper Splice Sites Are Chosen

As we have seen, intron sequences vary enormously in size, with some being inexcess of 100,000 nucleotides. If splice-site selection were determined solely bythe snRNPs acting on a preformed, protein-free RNA molecule, we would expectsplicing mistakes—such as exon skipping and the use of cryptic splice sites—tobe very common (Figure 6–31).

The fidelity mechanisms built into the spliceosome are supplemented bytwo additional factors that help ensure that splicing occurs accurately. Theseordering influences in the pre-mRNA increase the probability that the appropri-ate pairs of 5¢ and 3¢ splice sites will be brought together in the spliceosomebefore the splicing chemistry begins. The first results from the assembly of thespliceosome occurring as the pre-mRNA emerges from a transcribing RNA poly-merase II (see Figure 6–23). As for 5¢ cap formation, several components of thespliceosome seem to be carried on the phosphorylated tail of RNA polymerase.Their transfer directly from the polymerase to the nascent pre-mRNA presum-ably helps the cell to keep track of introns and exons: the snRNPs at a 5¢ splicesite are initially presented with only a single 3¢ splice site since the sites furtherdownstream have not yet been synthesized. This feature helps to preventinappropriate exon skipping.

The second factor that helps the cell to choose splice sites has been termedthe “exon definition hypothesis,” and it is understood only in outline. Exon sizetends to be much more uniform than intron size, averaging about 150nucleotide pairs across a wide variety of eucaryotic organisms (Figure 6–32). AsRNA synthesis proceeds, a group of spliceosome components, called the SR pro-teins (so-named because they contain a domain rich in serines and arginines),are thought to assemble on exon sequences and mark off each 3¢ and 5¢ splicesite starting at the 5¢ end of the RNA (Figure 6–33). This assembly takes place inconjunction with the U1 snRNA, which marks one exon boundary, and U2AF,

322 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–31 Two types of splicingerrors. Both types might be expected tooccur frequently if splice-site selectionwere performed by the spliceosome on apreformed, protein-free RNA molecule.“Cryptic” splicing signals are nucleotidesequences of RNA that closely resembletrue splicing signals.

exon 1

exon 1 exon 3

exon 2 exon 3

exon skipping

5¢ 3¢

exon 1

exon 1portion of exon 2

exon 2

cryptic splicesite selection

5¢ 3¢

“cryptic”splicingsignals

(A) (B)

100 <100 100–2000 2000–5000 5000–30,000 >30,000200 300 400 500 600 700 800 900 1000

1 10

0

20

30

40

50

60

0

2

3

4

5

6

7humanwormfly

humanwormfly

per

cen

tag

e o

f ex

on

s

per

cen

tag

e o

f in

tro

ns

exon length (nucleotide pairs) intron length (nucleotide pairs)

(A) (B)

Figure 6–32 Variation in intron andexon lengths in the human, worm,and fly genomes. (A) Size distribution of exons. (B) Size distribution of introns.Note that exon length is much moreuniform than intron length. (Adapted from International Human GenomeSequencing Consortium, Nature409:860–921, 2001.)

Page 25: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

which initially helps to specify the other. By specifically marking the exons inthis way, the cell increases the accuracy with which the initial splicing compo-nents are deposited on the nascent RNA and thereby helps to avoid crypticsplice sites. How the SR proteins discriminate exon sequences from intronsequences is not understood; however, it is known that some of the SR proteinsbind preferentially to RNA sequences in specific exons. In principle, the redun-dancy in the genetic code could have been exploited during evolution to selectfor binding sites for SR proteins in exons, allowing these sites to be created with-out constraining amino acid sequences.

Both the marking out of exon and intron boundaries and the assembly of thespliceosome begin on an RNA molecule while it is still being elongated by RNApolymerase at its 3¢ end. However, the actual chemistry of splicing can take placemuch later. This delay means that intron sequences are not necessarily removedfrom a pre-mRNA molecule in the order in which they occur along the RNAchain. It also means that, although spliceosome assembly is co-transcriptional,the splicing reactions sometimes occur posttranscriptionally—that is, after acomplete pre-mRNA molecule has been made.

A Second Set of snRNPs Splice a Small Fractionof Intron Sequences in Animals and Plants

Simple eucaryotes such as yeast have only one set of snRNPs that perform allpre-mRNA splicing. However, more complex eucaryotes such as flies, mammals,and plants have a second set of snRNPs that direct the splicing of a small frac-tion of their intron sequences. This minor form of spliceosome recognizes a dif-ferent set of DNA sequences at the 5¢ and 3¢ splice junctions and at the branchpoint; it is called the AT–AC spliceosome because of the nucleotide sequencedeterminants at its intron–exon borders (Figure 6–34). Despite recognizing dif-ferent nucleotide sequences, the snRNPs in this spliceosome make the sametypes of RNA–RNA interactions with the pre-mRNA and with each other as dothe major snRNPs (Figure 6–34B). The recent discovery of this class of snRNPsgives us confidence in the base-pair interactions deduced for the major spliceo-some, because it provides an independent set of molecules that undergo thesame RNA–RNA interactions despite differences in the RNA sequences involved.

A particular variation on splicing, called trans-splicing, has been discoveredin a few eucaryotic organisms. These include the single-celled trypanosomes—protozoans that cause African sleeping sickness in humans—and the modelmulticellular organism, the nematode worm. In trans-splicing, exons from twoseparate RNA transcripts are spliced together to form a mature mRNA molecule(see Figure 6–34). Trypanosomes produce all of their mRNAs in this way, where-as only about 1% of nematode mRNAs are produced by trans-splicing. In bothcases, a single exon is spliced onto the 5¢ end of many different RNA transcriptsproduced by the cell; in this way, all of the products of trans-splicing have thesame 5¢ exon and different 3¢ exons. Many of the same snRNPs that function inconventional splicing are used in this reaction, although trans-splicing uses aunique snRNP (called the SL RNP) that brings in the common exon (see Figure6–34).

FROM DNA TO RNA 323

Figure 6–33 The exon definitionhypothesis. According to one proposal,SR proteins bind to each exon sequencein the pre-mRNA and thereby help toguide the snRNPs to the properintron/exon boundaries.This demarcationof exons by the SR proteins occurs co-transcriptionally, beginning at the CBC(cap-binding complex) at the 5¢ end.Asindicated, the intron sequences in the pre-mRNA, which can be extremely long,are packaged into hnRNP (heterogeneousnuclear ribonucleoprotein) complexes thatcompact them into more manageablestructures and perhaps mask cryptic splicesites. Each hnRNP complex forms aparticle approximately twice the diameterof a nucleosome, and the core iscomposed of a set of at least eightdifferent proteins. It has been proposedthat hnRNP proteins preferentiallyassociate with intron sequences and thatthis preference also helps the spliceosomedistinguish introns from exons. However,as shown, at least some hnRNP proteinsmay bind to exon sequences but theirrole, if any, in exon definition has yet to beestablished. (Adapted from R. Reed, Curr.Opin. Cell Biol. 12:340–345, 2000.)

U1 U2

SRCBC

U1 U2

poly-A-bindingproteins

hnRNPcomplexes

intron10–105 nucleotides

intron10–105 nucleotides

exon ~200nucleotides

5¢ 3¢

hnRNP

Page 26: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

The reason that a few organisms use trans-splicing is not known; however, itis thought that the common 5¢ exon may aid in the translation of the mRNA.Thus, the products of trans-splicing in nematodes seem to be translated withespecially high efficiency.

RNA Splicing Shows Remarkable Plasticity

We have seen that the choice of splice sites depends on many features of the pre-mRNA transcript; these include the affinity of the three signals on the RNA (the5¢ and 3¢ splice junctions and branch point) for the splicing machinery, thelength and nucleotide sequence of the exon, the co-transcriptional assembly ofthe spliceosome, and the accuracy of the “bookkeeping” that underlies exon def-inition. So far we have emphasized the accuracy of the RNA splicing processesthat occur in a cell. But it also seems that the mechanism has been selected forits flexibility, which allows the cell to try out new proteins on occasion. Thus, forexample, when a mutation occurs in a nucleotide sequence critical for splicingof a particular intron, it does not necessarily prevent splicing of that intron alto-gether. Instead, the mutation typically creates a new pattern of splicing (Figure6–35). Most commonly, an exon is simply skipped (Figure 6–35B). In other cases,the mutation causes a “cryptic” splice junction to be used (Figure 6–35C). Pre-sumably, the splicing machinery has evolved to pick out the best possible pat-tern of splice junctions, and if the optimal one is damaged by mutation, it willseek out the next best pattern and so on. This flexibility in the process of RNAsplicing suggests that changes in splicing patterns caused by random mutationshave been an important pathway in the evolution of genes and organisms.

The plasticity of RNA splicing also means that the cell can easily regulate thepattern of RNA splicing. Earlier in this section we saw that alternative splicing

324 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

U1 U2

OH

A AGGU

U4/U6 •U5

exon 1 exon 2

U1, U4

U6

U5

U2

lariat structureand snRNPs

A

G

GUA

U11 U12

OH

A ACA U

U4 AT–AC/U6 AT–AC •U5

exon 1 exon 2

U11, U4 AT–AC

U6 AT–AC

U5

U12

lariat structureand snRNPs

A

C

AUA

U2

OH

A AGGSL

SL

SL

U

exon 1 exon 2

U4

U6

U5

U2

branched structureand snRNPs

A

G

GUA

U5U4

U6

(A)

(B)

GU R A G

G A G A C A

U

U6

U C C U

G AG A

AA U

GG A A

U

U6 AT-AC

MAJOR AT–AC

MAJOR AT–AC

TRANS

SL RNP

exon 1 exon 1

Figure 6–34 Outline of themechanisms used for three types ofRNA splicing. (A) Three types ofspliceosomes.The major spliceosome(left), the AT–AC spliceosome (middle),and the trans-spliceosome (right) are eachshown at two stages of assembly.The U5snRNP is the only component that iscommon to all three spliceosomes. Intronsremoved by the AT–AC spliceosome havea different set of consensus nucleotidesequences from those removed by themajor spliceosome. In humans, it isestimated that 0.1% of introns areremoved by the AT–AC spliceosome. Intrans-splicing, the SL snRNP is consumedin the reaction because a portion of theSL snRNA becomes the first exon of themature mRNA. (B) The major U6 snRNPand the U6 AT–AC snRNP both recognizethe 5¢ splice junction, but they do sothrough a different set of base-pairinteractions.The sequences shown arefrom humans. (Adapted from Y.-T.Yu et al.,The RNA World, pp. 487–524. Cold SpringHarbor, New York: Cold Spring HarborLaboratory Press, 1999.)

Page 27: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

can give rise to different proteins from the same gene. Some examples of alter-native splicing are constitutive; that is, the alternatively spliced mRNAs areproduced continuously by cells of an organism. However, in most cases, thesplicing patterns are regulated by the cell so that different forms of the proteinare produced at different times and in different tissues (see Figure 6–27). InChapter 7 we return to this issue to discuss some specific examples of regulatedRNA splicing.

Spliceosome-catalyzed RNA Splicing Probably Evolved from Self-splicing Mechanisms

When the spliceosome was first discovered, it puzzled molecular biologists. Whydo RNA molecules instead of proteins perform important roles in splice siterecognition and in the chemistry of splicing? Why is a lariat intermediate usedrather than the apparently simpler alternative of bringing the 5¢ and 3¢ splicesites together in a single step, followed by their direct cleavage and rejoining?The answers to these questions reflect the way in which the spliceosome isbelieved to have evolved.

As discussed briefly in Chapter 1 (and taken up again in more detail in thefinal section of this chapter), it is thought that early cells used RNA moleculesrather than proteins as their major catalysts and that they stored their geneticinformation in RNA rather than in DNA sequences. RNA-catalyzed splicing reac-tions presumably had important roles in these early cells. As evidence, someself-splicing RNA introns (that is, intron sequences in RNA whose splicing outcan occur in the absence of proteins or any other RNA molecules) remaintoday—for example, in the nuclear rRNA genes of the ciliate Tetrahymena, in afew bacteriophage T4 genes, and in some mitochondrial and chloroplast genes.

A self-splicing intron sequence can be identified in a test tube by incubatinga pure RNA molecule that contains the intron sequence and observing thesplicing reaction. Two major classes of self-splicing intron sequences can bedistinguished in this way. Group I intron sequences begin the splicing reactionby binding a G nucleotide to the intron sequence; this G is thereby activated toform the attacking group that will break the first of the phosphodiester bondscleaved during splicing (the bond at the 5¢ splice site). In group II intronsequences, an especially reactive A residue in the intron sequence is the attack-ing group, and a lariat intermediate is generated. Otherwise the reaction path-ways for the two types of self-splicing intron sequences are the same. Both arepresumed to represent vestiges of very ancient mechanisms (Figure 6–36).

For both types of self-splicing reactions, the nucleotide sequence of theintron is critical; the intron RNA folds into a specific three-dimensional struc-ture, which brings the 5¢ and 3¢ splice junctions together and provides preciselypositioned reactive groups to perform the chemistry (see Figure 6–6C). Based onthe fact that the chemistries of their splicing reactions are so similar, it has beenproposed that the pre-mRNA splicing mechanism of the spliceosome evolvedfrom group II splicing. According to this idea, when the spliceosomal snRNPstook over the structural and chemical roles of the group II introns, the strictsequence constraints on intron sequences would have disappeared, therebypermitting a vast expansion in the number of different RNAs that could bespliced.

FROM DNA TO RNA 325

exon1

exon2

exon3

intron sequences

(A) NORMAL ADULT b-GLOBIN PRIMARY RNA TRANSCRIPT

(B) SOME SINGLE-NUCLEOTIDE CHANGES THAT DESTROY A NORMAL SPLICE SITE CAUSE EXON SKIPPING

normal mRNA is formed from three exons

mRNA with exon 2 missing

(C) SOME SINGLE-NUCLEOTIDE CHANGES THAT DESTROY NORMAL SPLICE SITES ACTIVATE CRYPTIC SPLICE SITES

mRNA with extended exon 3

(D) SOME SINGLE-NUCLEOTIDE CHANGES THAT CREATE NEW SPLICE SITES CAUSE NEW EXONS TO BE INCORPORATED

mRNA with extra exon insertedbetween exon 2 and exon 3

Figure 6–35 Abnormal processing of the bb-globin primary RNAtranscript in humans with the disease bb thalassemia. In the examplesshown, the disease is caused by splice-site mutations, denoted by blackarrowheads.The dark blue boxes represent the three normal exon sequences;the red lines are used to indicate the 5¢ and 3¢ splice sites that are used insplicing the RNA transcript.The light blue boxes depict new nucleotidesequences included in the final mRNA molecule as a result of the mutation.Note that when a mutation leaves a normal splice site without a partner, anexon is skipped or one or more abnormal “cryptic” splice sites nearby isused as the partner site, as in (C) and (D). (Adapted in part from S.H. Orkin,in The Molecular Basis of Blood Diseases [G. Stamatoyannopoulos et al.,eds.], pp. 106–126. Philadelphia: Saunders, 1987.)

Page 28: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

326 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–37 Consensus nucleotidesequences that direct cleavage andpolyadenylation to form the 3¢¢ end ofa eucaryotic mRNA. These sequencesare encoded in the genome and arerecognized by specific proteins after theyare transcribed into RNA.The hexamerAAUAAA is bound by CPSF, the GU-richelement beyond the cleavage site by CstF(see Figure 6–38), and the CA sequenceby a third factor required for the cleavagestep. Like other consensus nucleotidesequences discussed in this chapter (seeFigure 6–12), the sequences shown in thefigure represent a variety of individualcleavage and polyadenylation signals.

– AAUAAA CA GU-rich or U-rich

– AAUAAA CA AAAAA – – – – – – – A

– AAUAAA CA OH

10–30 nucleotides < 30 nucleotides

CLEAVAGE

Poly-AADDITION

degraded inthe nucleus

~250

GU-rich or U-rich

OH

RNA-Processing Enzymes Generate the 3¢ End of Eucaryotic mRNAs

As previously explained, the 5¢ end of the pre-mRNA produced by RNA poly-merase II is capped almost as soon as it emerges from the RNA polymerase.Then, as the polymerase continues its movement along a gene, the spliceosomecomponents assemble on the RNA and delineate the intron and exon bound-aries. The long C-terminal tail of the RNA polymerase coordinates these pro-cesses by transferring capping and splicing components directly to the RNA asthe RNA emerges from the enzyme. As we see in this section, as RNA polymeraseII terminates transcription at the end of a gene, it uses a similar mechanism toensure that the 3¢ end of the pre-mRNA becomes appropriately processed.

As might be expected, the 3¢ ends of mRNAs are ultimately specified by DNAsignals encoded in the genome (Figure 6–37). These DNA signals are transcribed

OH

5¢ 3¢

+5¢ 3¢

+

GHO3¢

5¢ 3¢

G

5¢ 3¢OH

OH

lariat

5¢ exonsequence

intronsequence

3¢ exonsequence

precursorRNA molecule

transientintermediate

excisedintron

sequence

ligated exonsequences

Group I self-splicing intron sequences Group II self-splicing intron sequences

5¢ exonsequence

intronsequence

3¢ exonsequence A

A

G A

HO

OH

Figure 6–36 The two known classesof self-splicing intron sequences. Thegroup I intron sequences bind a free G nucleotide to a specific site on theRNA to initiate splicing, while the group IIintron sequences use an especially reactiveA nucleotide in the intron sequence itselffor the same purpose.The twomechanisms have been drawn toemphasize their similarities. Both arenormally aided in the cell by proteins thatspeed up the reaction, but the catalysis isnevertheless mediated by the RNA in theintron sequence. Both types of self-splicingreactions require the intron to be foldedinto a highly specific three-dimensionalstructure that provides the catalyticactivity for the reaction (see Figure 6–6).The mechanism used by group II intronsequences releases the intron as a lariatstructure and closely resembles thepathway of pre-mRNA splicing catalyzedby the spliceosome (compare with Figure6–29).The great majority of RNA splicingin eucaryotic cells is performed by thespliceosome, and self-splicing RNAsrepresent unusual cases. (Adapted fromT.R. Cech, Cell 44:207–210, 1986.)

Page 29: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

into RNA as the RNA polymerase II moves through them, and they are thenrecognized (as RNA) by a series of RNA-binding proteins and RNA-processingenzymes (Figure 6–38). Two multisubunit proteins, called CstF (cleavage stimu-lation factor F) and CPSF (cleavage and polyadenylation specificity factor), areof special importance. Both of these proteins travel with the RNA polymerase tailand are transferred to the 3¢ end processing sequence on an RNA molecule as itemerges from the RNA polymerase. Some of the subunits of CPSF are associatedwith the general transcription factor TFIID, which, as we saw earlier in this chap-ter, is involved in transcription initiation. During transcription initiation, thesesubunits may be transferred from TFIID to the RNA polymerase tail, remainingassociated there until the polymerase has transcribed through the end of a gene.

Once CstF and CPSF bind to specific nucleotide sequences on an emergingRNA molecule, additional proteins assemble with them to perform the process-ing that creates the 3¢ end of the mRNA. First, the RNA is cleaved (see Figure6–38). Next an enzyme called poly-A polymerase adds, one at a time, approxi-mately 200 A nucleotides to the 3¢ end produced by the cleavage. The nucleotideprecursor for these additions is ATP, and the same type of 5¢-to-3¢ bonds areformed as in conventional RNA synthesis (see Figure 6–4). Unlike the usual RNApolymerases, poly-A polymerase does not require a template; hence the poly-Atail of eucaryotic mRNAs is not directly encoded in the genome. As the poly-Atail is synthesized, proteins called poly-A-binding proteins assemble onto it and,by a poorly understood mechanism, determine the final length of the tail. Poly-A-binding proteins remain bound to the poly-A tail as the mRNA makes its jour-ney from the nucleus to the cytosol and they help to direct the synthesis of aprotein on the ribosome, as we see later in this chapter.

After the 3¢ end of a eucaryotic pre-mRNA molecule has been cleaved, theRNA polymerase II continues to transcribe, in some cases continuing as many asseveral hundred nucleotides beyond the DNA that contains the 3¢ cleavage-siteinformation. But the polymerase soon releases its grip on the template and tran-scription terminates; the piece of RNA downstream of the cleavage site is thendegraded in the cell nucleus. It is not yet understood what triggers the loss inpolymerase II processivity after the RNA is cleaved. One idea is that the transferof the 3¢ end processing factors from the RNA polymerase to the RNA causes aconformational change in the polymerase that loosens its hold on DNA; anotheris that the lack of a cap structure (and the CBC) on the 5¢ end of the RNA thatemerges from the polymerase somehow signals to the polymerase to terminatetranscription.

Mature Eucaryotic mRNAs Are Selectively Exported from the Nucleus

We have seen how eucaryotic pre-mRNA synthesis and processing takes place inan orderly fashion within the cell nucleus. However, these events create a specialproblem for eucaryotic cells, especially those of complex organisms where theintrons are vastly longer than the exons. Of the pre-mRNA that is synthesized,only a small fraction—the mature mRNA—is of further use to the cell. The rest—excised introns, broken RNAs, and aberrantly spliced pre-mRNAs—is not onlyuseless but could be dangerous if it was not destroyed. How then does the celldistinguish between the relatively rare mature mRNA molecules it wishes tokeep and the overwhelming amount of debris from RNA processing? The answeris that transport of mRNA from the nucleus to the cytoplasm, where it is trans-lated into protein, is highly selective—being closely coupled to correct RNA pro-cessing. This coupling is achieved by the nuclear pore complex, which recognizesand transports only completed mRNAs.

We have seen that as a pre-mRNA molecule is synthesized and processed, itis bound by a variety of proteins, including the cap-binding complex, the SRproteins, and the poly-A binding proteins. To be “export-ready,” it seems than anmRNA must be bound by the appropriate set of proteins—with certain proteinssuch as the cap-binding complex being present, and others such as snRNP pro-teins absent. Additional proteins, placed on the RNA during splicing, seem to

FROM DNA TO RNA 327

Figure 6–38 Some of the major stepsin generating the 3¢¢ end of aeucaryotic mRNA. This process ismuch more complicated than theanalogous process in bacteria, where theRNA polymerase simply stops at atermination signal and releases both the 3¢ end of its transcript and the DNAtemplate (see Figure 6–10).

AAAUAA

P P P P

P P P P

RNA

RNA polymerase

CstFCPSF

cleavageand poly-Asignalsencodedin DNA

AAUAAA

CPSFPAP

RNApolymeraseeventuallyterminates

AAAAAAAAAAAA

AAUAAA AAAAAAAAAAAAA

mature 3¢ end ofan mRNA molecule

additionalcleavagefactors

poly-A polymerase(PAP)

poly-A-bindingprotein

additionalpoly-A-bindingproteinPOLY-A LENGTH

REGULATION

RNA CLEAVED

200AAAAAAAAAAAAAA

Page 30: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

mark exon-exon boundaries and thereby signify completed splicing events.Only if the proper set of proteins is bound to an mRNA is it guided through thenuclear pore complex into the cytosol. As described in Chapter 12, nuclearpore complexes are aqueous channels in the nuclear membrane that directlyconnect the nucleoplasm and cytosol. Small molecules (less than 50,000 dal-tons) can diffuse freely through them. However, most of the macromolecules incells, including mRNAs complexed with proteins, are far too large to passthrough the pores without a special process to move them. An active transportof substances through the nuclear pore complexes occurs in both directions. Asexplained in Chapter 12, signals on the macromolecule determine whether it isexported from the nucleus (a mRNA, for example) or imported into it (an RNApolymerase, for example). For the case of mRNAs, the bound proteins that markcompleted splicing events are of particular importance, as they are known toserve directly as RNA export factors (see Figure 12–16). mRNAs transcribed fromgenes that lack introns apparently contain nucleotide sequences that aredirectly recognized by other RNA export factors. Eucaryotic cells thus use theirnuclear pore complexes as gates that allow only useful RNA molecules to enterthe cytoplasm.

Of all the proteins that assemble on pre-mRNA molecules as they emergefrom transcribing RNA polymerases, the most abundant are the hnRNPs (het-erogeneous nuclear ribonuclear proteins). Some of these proteins (there areapproximately 30 of them in humans) remove the hairpin helices from the RNAso that splicing and other signals on the RNA can be read more easily. Otherspackage the RNA contained in the very long intron sequences typically found ingenes of complex organisms (see Figure 6–33). Apart from histones, certainhnRNP proteins are the most abundant proteins in the cell nucleus, and theymay play a particularly important role in distinguishing mature mRNA fromprocessing debris. hnRNP particles (nucleosome-like complexes of hnRNP pro-teins and RNA—see Figure 6–33) are largely excluded from exon sequences, per-haps by prior binding of spliceosome components. They remain on excisedintrons and probably help mark them for nuclear retention and eventualdestruction.

The export of mRNA–protein complexes from the nucleus can be observedwith an electron microscope for the unusually abundant mRNA of the insectBalbiani Ring genes. As these genes are transcribed, the newly formed RNA isseen to be packaged by proteins (including hnRNP and SR proteins). This pro-tein–RNA complex undergoes a series of structural transitions, probably reflect-ing RNA processing events, culminating in a curved fiber (Figure 6–39). Thiscurved fiber then moves through the nucleoplasm and enters the nuclear porecomplex (with its 5¢ cap proceeding first), and it undergoes another series ofstructural transitions as it moves through the NPC. These and other observa-tions reveal that the pre-mRNA–protein and mRNA–protein complexes aredynamic structures that gain and lose numerous specific proteins during RNAsynthesis, processing, export, and translation (Figure 6–40).

Before discussing what happens to mRNAs after they leave the nucleus, webriefly consider how the synthesis and processing of noncoding RNA molecules

328 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

nuclear porecomplex

chromatin

(A)

TRANSCRIPTION

RNA as it emerges from RNA polymerase

“export-ready”RNA

NUCLEUS CYTOPLASM

200 nm(B)

NUCLEUS CYTOSOL

Figure 6–39 Transport of a largemRNA molecule through thenuclear pore complex. (A) Thematuration of a Balbiani Ring mRNAmolecule as it is synthesized by RNApolymerase and packaged by a variety ofnuclear proteins.This drawing of unusuallyabundant RNA produced by an insect cellis based on EM micrographs such as thatshown in (B). Balbiani Rings are describedin Chapter 4. (A, adapted from B. Daneholt, Cell 88:585–588, 1997; B, fromB.J. Stevens and H. Swift, J. Cell Biol.31:55–77, 1966. © The RockefellerUniversity Press.)

Page 31: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

occurs. Although there are many other examples, our discussion focuses on therRNAs that are critically important for the translation of mRNAs into protein.

Many Noncoding RNAs Are Also Synthesized and Processed in the Nucleus

A few per cent of the dry weight of a mammalian cell is RNA; of that, only about3–5% is mRNA. A fraction of the remainder represents intron sequences beforethey have been degraded, but most of the RNA in cells performs structural andcatalytic functions (see Table 6–1, p. 306). The most abundant RNAs in cells arethe ribosomal RNAs (rRNAs)—constituting approximately 80% of the RNA inrapidly dividing cells. As discussed later in this chapter, these RNAs form thecore of the ribosome. Unlike bacteria—in which all RNAs in the cell are synthe-sized by a single RNA polymerase—eucaryotes have a separate, specialized poly-merase, RNA polymerase I, that is dedicated to producing rRNAs. RNA poly-merase I is similar structurally to the RNA polymerase II discussed previously;however, the absence of a C-terminal tail in polymerase I helps to explain whyits transcripts are neither capped nor polyadenylated. As discussed earlier, thisdifference helps the cell distinguish between noncoding RNAs and mRNAs.

Because multiple rounds of translation of each mRNA molecule can providean enormous amplification in the production of protein molecules, many of theproteins that are very abundant in a cell can be synthesized from genes that arepresent in a single copy per haploid genome. In contrast, the RNA componentsof the ribosome are final gene products, and a growing mammalian cell mustsynthesize approximately 10 million copies of each type of ribosomal RNA ineach cell generation to construct its 10 million ribosomes. Adequate quantitiesof ribosomal RNAs can be produced only because the cell contains multiplecopies of the rRNA genes that code for ribosomal RNAs (rRNAs). Even E. colineeds seven copies of its rRNA genes to meet the cell’s need for ribosomes.Human cells contain about 200 rRNA gene copies per haploid genome, spreadout in small clusters on five different chromosomes (see Figure 4–11), while cells

FROM DNA TO RNA 329

AAAAAAAAA

SR

RNAexportfactor CBC

nuclear pore complex

hnRNPproteins

poly-A-bindingproteins

NUCLEUS CYTOSOL

nuclear-restrictedproteins

AA

AA

AAA

AAAAAAAA

5¢ cap

200

eIF-4GeIF-4E

AA

AA

AAAA TRANSLATION

export-ready mRNA

initiation factors forprotein synthesis

Figure 6–40 Schematic illustration of an “export-ready” mRNA molecule and its transportthrough the nuclear pore. As indicated, some proteins travel with the mRNA as it moves through thepore, whereas others remain in the nucleus. Once in the cytoplasm, the mRNA continues to shed previouslybound proteins and acquire new ones; these substitutions affect the subsequent translation of the message.Because some are transported with the RNA, the proteins that become bound to an mRNA in the nucleuscan influence its subsequent stability and translation in the cytosol. RNA export factors, shown in thenucleus, play an active role in transporting the mRNA to the cytosol (see Figure 12–16). Some are depositedat exon-exon boundaries as splicing is completed, thus signifying those regions of the RNA that have beenproperly spliced.

Page 32: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

of the frog Xenopus contain about 600 rRNA gene copies per haploid genome ina single cluster on one chromosome (Figure 6–41).

There are four types of eucaryotic rRNAs, each present in one copy per ribo-some. Three of the four rRNAs (18S, 5.8S, and 28S) are made by chemically mod-ifying and cleaving a single large precursor rRNA (Figure 6–42); the fourth (5SRNA) is synthesized from a separate cluster of genes by a different polymerase,RNA polymerase III, and does not require chemical modification. It is not knownwhy this one RNA is transcribed separately.

Extensive chemical modifications occur in the 13,000-nucleotide-long pre-cursor rRNA before the rRNAs are cleaved out of it and assembled into ribo-somes. These include about 100 methylations of the 2¢-OH positions onnucleotide sugars and 100 isomerizations of uridine nucleotides to pseudouri-dine (Figure 6–43A). The functions of these modifications are not understood indetail, but they probably aid in the folding and assembly of the final rRNAs andmay also subtly alter the function of ribosomes. Each modification is made at aspecific position in the precursor rRNA. These positions are specified by severalhundred “guide RNAs,” which locate themselves through base-pairing to theprecursor rRNA and thereby bring an RNA-modifying enzyme to the appropri-ate position (Figure 6–43B). Other guide RNAs promote cleavage of the precur-sor rRNAs into the mature rRNAs, probably by causing conformationalchanges in the precursor rRNA. All of these guide RNAs are members of a largeclass of RNAs called small nucleolar RNAs (or snoRNAs), so named becausethese RNAs perform their functions in a subcompartment of the nucleuscalled the nucleolus. Many snoRNAs are encoded in the introns of other genes,

330 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

2 mm

Figure 6–41 Transcription fromtandemly arranged rRNA genes, asseen in the electron microscope. Thepattern of alternating transcribed geneand nontranscribed spacer is readily seen.A higher-magnification view was shown inFigure 6–9. (From V.E. Foe, Cold SpringHarbor Symp. Quant. Biol. 42:723–740,1978.)

13,000 nucleotides

CLEAVAGE

CHEMICAL MODIFICATION

degraded regions ofnucleotide sequence

18S rRNA 5.8S rRNA 28S rRNA

5S rRNA madeelsewhere

incorporated into smallribosomal subunit

incorporated into largeribosomal subunit

ppp5¢ 3¢

OH

45S precursor rRNA Figure 6–42 The chemicalmodification and nucleolyticprocessing of a eucaryotic 45Sprecursor rRNA molecule into threeseparate ribosomal RNAs. Asindicated, two types of chemicalmodifications (shown in Figure 6–43) aremade to the precursor rRNA before it iscleaved. Nearly half of the nucleotidesequences in this precursor rRNA arediscarded and degraded in the nucleus.The rRNAs are named according to their“S” values, which refer to their rate ofsedimentation in an ultra-centrifuge.Thelarger the S value, the larger the rRNA.

Page 33: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

especially those encoding ribosomal proteins. They are therefore synthesized byRNA polymerase II and processed from excised intron sequences.

The Nucleolus Is a Ribosome-Producing Factory

The nucleolus is the most obvious structure seen in the nucleus of a eucaryoticcell when viewed in the light microscope. Consequently, it was so closely scruti-nized by early cytologists that an 1898 review could list some 700 references. Wenow know that the nucleolus is the site for the processing of rRNAs and theirassembly into ribosomes. Unlike other organelles in the cell, it is not bound by amembrane (Figure 6–44); instead, it is a large aggregate of macromolecules,including the rRNA genes themselves, precursor rRNAs, mature rRNAs, rRNA-processing enzymes, snoRNPs, ribosomal protein subunits and partly assembled

FROM DNA TO RNA 331

Figure 6–43 Modifications of theprecursor rRNA by guide RNAs.(A) Two prominent covalent modificationsoccur after rRNA synthesis; thedifferences from the initially incorporatednucleotide are indicated by red atoms.(B) As indicated, snoRNAs locate the sitesof modification by base-pairing tocomplementary sequences on theprecursor rRNA.The snoRNAs are boundto proteins, and the complexes are calledsnoRNPs. snoRNPs contain the RNAmodification activities, presumablycontributed by the proteins but possiblyby the snoRNAs themselves.

O H2C

OH OH

C

C

C

HN

O

O

ribose

O

OH O

CH3

HOH2CHO

ribose

base

(A)

(B)

pseudouridine 2¢-O-methylated nucleotide

SnoRNA

SnoRNA

RNA-modifyingenzyme

RNA-modifying enzyme

precursorrRNA

NH

CH

Figure 6–44 Electron micrograph ofa thin section of a nucleolus in ahuman fibroblast, showing its threedistinct zones. (A) View of entirenucleus. (B) High-power view of thenucleolus. It is believed that transcriptionof the rRNA genes takes place betweenthe fibrillar center and the dense fibrillarcomponent and that processing of therRNAs and their assembly into ribosomesproceeds outward from the dense fibrillarcomponent to the surrounding granularcomponents. (Courtesy of E.G. Jordan and J. McGovern.)

nuclearenvelope

nucleolus

densefibrillarcomponent

granularcomponent

fibrillarcenter

(B)(A)2 mm 1 mm

peripheral heterochromatin

Page 34: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

ribosomes. The close association of all these components presumably allows theassembly of ribosomes to occur rapidly and smoothly.

It is not yet understood how the nucleolus is held together and organized,but various types of RNA molecules play a central part in its chemistry andstructure, suggesting that the nucleolus may have evolved from an ancientstructure present in cells dominated by RNA catalysis. In present-day cells, therRNA genes also have an important role in forming the nucleolus. In a diploidhuman cell, the rRNA genes are distributed into 10 clusters, each of which islocated near the tip of one of the two copies of five different chromosomes (seeFigure 4–11). Each time a human cell undergoes mitosis, the chromosomes dis-perse and the nucleolus breaks up; after mitosis, the tips of the 10 chromosomescoalesce as the nucleolus reforms (Figures 6–45 and 6–46). The transcription ofthe rRNA genes by RNA polymerase I is necessary for this process.

As might be expected, the size of the nucleolus reflects the number of ribo-somes that the cell is producing. Its size therefore varies greatly in different cellsand can change in a single cell, occupying 25% of the total nuclear volume incells that are making unusually large amounts of protein.

A schematic diagram of the assembly of ribosomes is shown in Figure 6–47.In addition to its important role in ribosome biogenesis, the nucleolus is also thesite where other RNAs are produced and other RNA–protein complexes areassembled. For example, the U6 snRNP, which, as we have seen, functions inpre-mRNA splicing (see Figure 6–29), is composed of one RNA molecule and atleast seven proteins. The U6 snRNA is chemically modified by snoRNAs in thenucleolus before its final assembly there into the U6 snRNP. Other importantRNA protein complexes, including telomerase (encountered in Chapter 5) andthe signal recognition particle (which we discuss in Chapter 12), are also believedto be assembled at the nucleolus. Finally, the tRNAs (transfer RNAs) that carry theamino acids for protein synthesis are processed there as well. Thus, the nucleo-lus can be thought of as a large factory at which many different noncoding RNAsare processed and assembled with proteins to form a large variety of ribonucleo-protein complexes.

The Nucleus Contains a Variety of Subnuclear Structures

Although the nucleolus is the most prominent structure in the nucleus, severalother nuclear bodies have been visualized and studied (Figure 6–48). Theseinclude Cajal bodies (named for the scientist who first described them in 1906),GEMS (Gemini of coiled bodies), and interchromatin granule clusters (also called“speckles”). Like the nucleolus, these other nuclear structures lack membranes

332 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–46 Nucleolar fusion. Theselight micrographs of human fibroblastsgrown in culture show various stages ofnucleolar fusion.After mitosis, each of theten human chromosomes that carry acluster of rRNA genes begins to form atiny nucleolus, but these rapidly coalesceas they grow to form the single largenucleolus typical of many interphase cells.(Courtesy of E.G. Jordan and J. McGovern.)

nuclearenvelope

nucleolus

nucleolardissociation

nucleolarassociation

preparationfor mitosisG2

prophase

metaphase

anaphase

MIT

OS

IS

telophase

preparationfor DNAreplication

G1

S DNAreplication

Figure 6–45 Changes in the appearance of the nucleolus in ahuman cell during the cell cycle. Only the cell nucleus is represented inthis diagram. In most eucaryotic cells the nuclear membrane breaks downduring mitosis, as indicated by the dashed circles.

10 mm

Page 35: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

and are highly dynamic; their appearance is probably the result of the tightassociation of protein and RNA (and perhaps DNA) components involved inthe synthesis, assembly, and storage of macromolecules involved in geneexpression. Cajal bodies and GEMS resemble one another and are frequentlypaired in the nucleus; it is not clear whether they are truly distinct structures.They may be sites where snRNAs and snoRNAs undergo their final modifica-tions and assembly with protein. Both the RNAs and the proteins that make upthe snRNPs are partly assembled in the cytoplasm, but they are transported intothe nucleus for their final modifications. It has been proposed that Cajal bod-ies/GEMS are also sites where the snRNPs are recycled and their RNAs are “reset”after the rearrangements that occur during splicing (see p. 322). In contrast, theinterchromatin granule clusters have been proposed to be stockpiles of fullymature snRNPs that are ready to be used in splicing of pre-mRNAs (Figure 6–49).

Scientists have had difficulties in working out the function of the small sub-nuclear structures just described. Much of the progress now being madedepends on genetic tools—examination of the effects of designed mutations inmice or of spontaneous mutations in humans. As one example, GEMS containthe SMN (survival of motor neurons) protein. Certain mutations of the geneencoding this protein are the cause of inherited spinal muscular atrophy, ahuman disease characterized by a wasting away of the muscles. The disease seems

FROM DNA TO RNA 333

Figure 6–47 The function of thenucleolus in ribosome and otherribonucleoprotein synthesis. The 45Sprecursor rRNA is packaged in a largeribonucleoprotein particle containingmany ribosomal proteins imported fromthe cytoplasm.While this particle remainsin the nucleolus, selected pieces are addedand others discarded as it is processedinto immature large and small ribosomalsubunits.The two ribosomal subunits arethought to attain their final functionalform only as each is individuallytransported through the nuclear poresinto the cytoplasm. Otherribonucleoprotein complexes, includingtelomerase shown here, are alsoassembled in the nucleolus.

NUCLEOLUS

NUCLEUS

CYTOPLASM

loop of nucleolar organizer DNA

rRNA gene

TRANSCRIPTION 45S rRNAprecursor

ribosomalproteinsmade incytoplasm

largeribonucleo-proteinparticle

RECYCLING OF RNAsAND PROTEINS INVOLVED INrRNA PROCESSING

5S rRNA

immature largesubunit

large subunit

smallsubunit

TRANSPORT ANDFINAL ASSEMBLYOF RIBOSOMES

40Ssubunit

60Ssubunit

SnoRNAs

proteinsinvolved in processingof rRNA

MODIFICATIONAND PROCESSING

OF rRNAs

telomerase RNA

telomeraseproteins

telomerase

Page 36: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

to be caused by a subtle defect in snRNP assembly and subsequent pre-mRNAsplicing. More severe defects would be expected to be lethal.

Given the importance of nuclear subdomains in RNA processing, it mighthave been expected that pre-mRNA splicing would occur in a particular locationin the nucleus, as it requires numerous RNA and protein components. However,

334 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–49 Schematic view ofsubnuclear structures. A typicalvertebrate nucleus has several Cajalbodies, which are proposed to be the siteswhere snRNPs and snoRNPs undergotheir final modifications. Interchromatingranule clusters are proposed to bestorage sites for fully mature snRNPs.A typical vertebrate nucleus has 20–50interchromatin granule clusters.

After their initial synthesis, snRNAs areexported from the nucleus, after whichthey undergo 5¢ and 3¢ end-processing andassemble with the seven common snRNPproteins (called Sm proteins).Thesecomplexes are reimported into thenucleus and the snRNPs undergo theirfinal modification in Cajal bodies. Inaddition, the U6 snRNP requires chemicalmodification by snoRNAs in the nucleolus.The sites of active transcription andsplicing (approximately 2000–3000 sitesper vertebrate nucleus) correspond to the“perichromatin fibers” seen under theelectron microscope. (Adapted from J.D. Lewis and D.Tollervey, Science288:1385–1389, 2000.)

snRNPRECYCLING

snRNAs

partially assembledsnRNPs

snRNPproteins

Cajal bodies and GEMS

interchomatingranuleclusters

sites of activetranscription and

RNA splicing

chromosome territories

NUCLEOLUS

nuclear envelope

(A) (B)

(C) (E)(D)5 mm

Figure 6–48 Visualization of chromatin and nuclear bodies. (A)–(D) show micrographs of the samehuman cell nucleus, each processed differently to show a particular set of nuclear structures. (E) shows anenlarged superposition of all four individual images. (A) shows the location of the protein fibrillarin (acomponent of several snoRNPs), which is present at both nucleoli and Cajal bodies, the latter indicated byarrows. (B) shows interchromatin granule clusters or “speckles” detected by using antibodies against aprotein involved in pre-mRNA splicing. (C) is stained to show bulk chromatin. (D) shows the location of theprotein coilin, which is present at Cajal bodies (indicated by arrows). (From J.R. Swedlow and A.I. Lamond,Gen. Biol. 2:1–7, 2001; micrographs courtesy of Judith Sleeman.)

Page 37: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

we have seen that the assembly of splicing components on pre-mRNA is co-transcriptional; thus splicing must occur at many locations along chromosomes.We saw in Chapter 4 that interphase chromosomes occupy discrete territories inthe nucleus, and transcription and pre-mRNA splicing must take place withinthese territories. However, interphase chromosomes are themselves dynamicand their exact positioning in the nucleus correlates with gene expression. Forexample, transcriptionally silent regions of interphase chromosomes are oftenassociated with the nuclear envelope where the concentration of heterochro-matin components is believed to be especially high. When these same regionsbecome transcriptionally active, they relocate towards the interior of the nucle-us, which is richer in the components required for mRNA synthesis. It has beenproposed that, although a typical mammalian cell may be expressing on theorder of 15,000 genes, transcription and RNA splicing may be localized to onlyseveral thousand sites in the nucleus. These sites themselves are highly dynamicand probably result from the association of transcription and splicing compo-nents to create small “assembly lines” where the local concentration of thesecomponents is very high. As a result, the nucleus seems to be highly organizedinto subdomains, with snRNPs, snoRNPs, and other nuclear componentsmoving between them in an orderly fashion according to the needs of the cell(Figure 6–49).

Summary

Before the synthesis of a particular protein can begin, the corresponding mRNAmolecule must be produced by transcription. Bacteria contain a single type of RNApolymerase (the enzyme that carries out the transcription of DNA into RNA). AnmRNA molecule is produced when this enzyme initiates transcription at a promoter,synthesizes the RNA by chain elongation, stops transcription at a terminator, andreleases both the DNA template and the completed mRNA molecule. In eucaryoticcells, the process of transcription is much more complex, and there are three RNApolymerases—designated polymerase I, II, and III—that are related evolutionarilyto one another and to the bacterial polymerase.

Eucaryotic mRNA is synthesized by RNA polymerase II. This enzyme requires aseries of additional proteins, termed the general transcription factors, to initiatetranscription on a purified DNA template and still more proteins (including chro-matin-remodeling complexes and histone acetyltransferases) to initiate transcriptionon its chromatin template inside the cell. During the elongation phase of transcrip-tion, the nascent RNA undergoes three types of processing events: a specialnucleotide is added to its 5¢ end (capping), intron sequences are removed from themiddle of the RNA molecule (splicing), and the 3¢ end of the RNA is generated (cleav-age and polyadenylation). Some of these RNA processing events that modify the ini-tial RNA transcript (for example, those involved in RNA splicing) are carried outprimarily by special small RNA molecules.

For some genes, RNA is the final product. In eucaryotes, these genes are usuallytranscribed by either RNA polymerase I or RNA polymerase III. RNA polymerase Imakes the ribosomal RNAs. After their synthesis as a large precursor, the rRNAs arechemically modified, cleaved, and assembled into ribosomes in the nucleolus—adistinct subnuclear structure that also helps to process some smaller RNA–proteincomplexes in the cell. Additional subnuclear structures (including Cajal bodiesand interchromatin granule clusters) are sites where components involved in RNAprocessing are assembled, stored, and recycled.

FROM RNA TO PROTEINIn the preceding section we have seen that the final product of some genes is anRNA molecule itself, such as those present in the snRNPs and in ribosomes.However, most genes in a cell produce mRNA molecules that serve as interme-diaries on the pathway to proteins. In this section we examine how the cell con-verts the information carried in an mRNA molecule into a protein molecule. Thisfeat of translation first attracted the attention of biologists in the late 1950s,

FROM RNA TO PROTEIN 335

Page 38: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

when it was posed as the “coding problem”: how is the information in a linearsequence of nucleotides in RNA translated into the linear sequence of a chemi-cally quite different set of subunits—the amino acids in proteins? This fascinat-ing question stimulated great excitement among scientists at the time. Here wasa cryptogram set up by nature that, after more than 3 billion years of evolution,could finally be solved by one of the products of evolution—human beings. Andindeed, not only has the code been cracked step by step, but in the year 2000 theelaborate machinery by which cells read this code—the ribosome—was finallyrevealed in atomic detail.

An mRNA Sequence Is Decoded in Sets of Three Nucleotides

Once an mRNA has been produced, by transcription and processing the infor-mation present in its nucleotide sequence is used to synthesize a protein. Tran-scription is simple to understand as a means of information transfer: since DNAand RNA are chemically and structurally similar, the DNA can act as a directtemplate for the synthesis of RNA by complementary base-pairing. As the termtranscription signifies, it is as if a message written out by hand is being converted,say, into a typewritten text. The language itself and the form of the message donot change, and the symbols used are closely related.

In contrast, the conversion of the information in RNA into protein repre-sents a translation of the information into another language that uses quite dif-ferent symbols. Moreover, since there are only four different nucleotides inmRNA and twenty different types of amino acids in a protein, this translationcannot be accounted for by a direct one-to-one correspondence between anucleotide in RNA and an amino acid in protein. The nucleotide sequence of agene, through the medium of mRNA, is translated into the amino acid sequenceof a protein by rules that are known as the genetic code. This code was deci-phered in the early 1960s.

The sequence of nucleotides in the mRNA molecule is read consecutively ingroups of three. RNA is a linear polymer of four different nucleotides, so thereare 4 ¥ 4 ¥ 4 = 64 possible combinations of three nucleotides: the triplets AAA,AUA, AUG, and so on. However, only 20 different amino acids are commonlyfound in proteins. Either some nucleotide triplets are never used, or the code isredundant and some amino acids are specified by more than one triplet. Thesecond possibility is, in fact, the correct one, as shown by the completely deci-phered genetic code in Figure 6–50. Each group of three consecutive nucleotidesin RNA is called a codon, and each codon specifies either one amino acid or astop to the translation process.

This genetic code is used universally in all present-day organisms. Althougha few slight differences in the code have been found, these are chiefly in the DNAof mitochondria. Mitochondria have their own transcription and protein syn-thesis systems that operate quite independently from those of the rest of the cell,and it is understandable that their small genomes have been able to accommo-date minor changes to the code (discussed in Chapter 14).

In principle, an RNA sequence can be translated in any one of three differ-ent reading frames, depending on where the decoding process begins (Figure6–51). However, only one of the three possible reading frames in an mRNAencodes the required protein. We see later how a special punctuation signal atthe beginning of each RNA message sets the correct reading frame at the start ofprotein synthesis.

336 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

C U C A G C G U U A C C A U

Leu Ser Val Thr

1

C U C A G C G U U A C C A U2

Ser Ala Leu Pro

C U C A G C G U U A C C A U

Gln Arg Tyr His3

Figure 6–51 The three possiblereading frames in protein synthesis.In the process of translating a nucleotidesequence (blue) into an amino acidsequence (green), the sequence ofnucleotides in an mRNA molecule is readfrom the 5¢ to the 3¢ end in sequentialsets of three nucleotides. In principle,therefore, the same RNA sequence canspecify three completely different aminoacid sequences, depending on the readingframe. In reality, however, only one ofthese reading frames contains the actual message.

Figure 6–50 The genetic code. Thestandard one-letter abbreviation for eachamino acid is presented below its three-letter abbreviation (see Panel 3–1,pp. 132–133, for the full name of eachamino acid and its structure). Byconvention, codons are always writtenwith the 5¢-terminal nucleotide to the left.Note that most amino acids arerepresented by more than one codon, andthat there are some regularities in the setof codons that specifies each amino acid.Codons for the same amino acid tend tocontain the same nucleotides at the firstand second positions, and vary at the thirdposition.Three codons do not specify anyamino acid but act as termination sites(stop codons), signaling the end of the protein-coding sequence. One codon—AUG—acts both as an initiationcodon, signaling the start of a protein-coding message, and also as the codonthat specifies methionine.

GCAGCCGCGGCU

Ala

A

AGAAGGCGACGCCGGCGU

Arg

R

GACGAU

Asp

D

AACAAU

Asn

N

UGCUGU

Cys

C

GAAGAG

Glu

E

CAACAG

Gln

Q

GGAGGCGGGGGU

Gly

G

CACCAU

His

H

AUAAUCAUU

Ile

I

UUAUUGCUACUCCUGCUU

Leu

L

AAAAAG

Lys

K

AUG

Met

M

UUCUUU

Phe

F

CCACCCCCGCCU

Pro

P

AGCAGUUCAUCCUCGUCU

Ser

S

ACAACCACGACU

Thr

T

UGG

Trp

W

UACUAU

Tyr

Y

GUAGUCGUGGUU

Val

V

UAAUAGUGA

stop

Page 39: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

tRNA Molecules Match Amino Acids to Codons in mRNA

The codons in an mRNA molecule do not directly recognize the amino acidsthey specify: the group of three nucleotides does not, for example, bind directlyto the amino acid. Rather, the translation of mRNA into protein depends onadaptor molecules that can recognize and bind both to the codon and, at anothersite on their surface, to the amino acid. These adaptors consist of a set of smallRNA molecules known as transfer RNAs (tRNAs), each about 80 nucleotides inlength.

We saw earlier in this chapter that RNA molecules can fold up into preciselydefined three-dimensional structures, and the tRNA molecules provide a strik-ing example. Four short segments of the folded tRNA are double-helical, pro-ducing a molecule that looks like a cloverleaf when drawn schematically (Figure6–52A). For example, a 5¢-GCUC-3¢ sequence in one part of a polynucleotidechain can form a relatively strong association with a 5¢-GAGC-3¢ sequence inanother region of the same molecule. The cloverleaf undergoes further folding toform a compact L-shaped structure that is held together by additional hydrogenbonds between different regions of the molecule (Figure 6–52B,C).

Two regions of unpaired nucleotides situated at either end of the L-shapedmolecule are crucial to the function of tRNA in protein synthesis. One of theseregions forms the anticodon, a set of three consecutive nucleotides that pairswith the complementary codon in an mRNA molecule. The other is a short sin-gle-stranded region at the 3¢ end of the molecule; this is the site where the aminoacid that matches the codon is attached to the tRNA.

We have seen in the previous section that the genetic code is redundant; thatis, several different codons can specify a single amino acid (see Figure 6–50).This redundancy implies either that there is more than one tRNA for many of theamino acids or that some tRNA molecules can base-pair with more than one

FROM RNA TO PROTEIN 337

ACGCUUAAG A C A C

C

C

U A

G

T ΨGUGUCC

UGGAG

GUCΨ

AY

AAGUCA

GA

GCC

CGAGAGGGD

D GA CUC GAUUUAGGCG

attached aminoacid (Phe)

3¢ end

5¢ end

anticodon

anticodonloop

anticodon

(A)

(D)

(B) (C)

D loop

T loop

5¢ GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYAΨCUGGAGGUCCUGUGTΨCGAUCCACAGAAUUCGCACCA 3¢

ACC

a clover leaf

Figure 6–52 A tRNA molecule. In this series of diagrams, the same tRNA molecule—in this case atRNA specific for the amino acid phenylalanine (Phe)—is depicted in various ways. (A) The cloverleafstructure, a convention used to show the complementary base-pairing (red lines) that creates the double-helical regions of the molecule.The anticodon is the sequence of three nucleotides that base-pairswith a codon in mRNA.The amino acid matching the codon/anticodon pair is attached at the 3¢ end of thetRNA. tRNAs contain some unusual bases, which are produced by chemical modification after the tRNAhas been synthesized. For example, the bases denoted y (for pseudouridine—see Figure 6–43) and D (fordihydrouridine—see Figure 6–55) are derived from uracil. (B and C) Views of the actual L-shaped molecule,based on x-ray diffraction analysis.Although a particular tRNA, that for the amino acid phenylalanine, isdepicted, all other tRNAs have very similar structures. (D) The linear nucleotide sequence of the molecule,color-coded to match A, B, and C.

Page 40: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

codon. In fact, both situations occur. Some amino acids have more than onetRNA and some tRNAs are constructed so that they require accurate base-pair-ing only at the first two positions of the codon and can tolerate a mismatch (orwobble) at the third position (Figure 6–53). This wobble base-pairing explainswhy so many of the alternative codons for an amino acid differ only in their thirdnucleotide (see Figure 6–50). In bacteria, wobble base-pairings make it possibleto fit the 20 amino acids to their 61 codons with as few as 31 kinds of tRNAmolecules. The exact number of different kinds of tRNAs, however, differs fromone species to the next. For example, humans have 497 tRNA genes but, amongthem, only 48 different anticodons are represented.

tRNAs Are Covalently Modified Before They Exit from the Nucleus

We have seen that most eucaryotic RNAs are covalently altered before they areallowed to exit from the nucleus, and tRNAs are no exception. Eucaryotic tRNAsare synthesized by RNA polymerase III. Both bacterial and eucaryotic tRNAs aretypically synthesized as larger precursor tRNAs, and these are then trimmed toproduce the mature tRNA. In addition, some tRNA precursors (from both bac-teria and eucaryotes) contain introns that must be spliced out. This splicingreaction is chemically distinct from that of pre-mRNA splicing; rather than gen-erating a lariat intermediate, tRNA splicing occurs through a cut-and-pastemechanism that is catalyzed by proteins (Figure 6–54). Trimming and splicingboth require the precursor tRNA to be correctly folded in its cloverleaf configu-ration. Because misfolded tRNA precursors will not be processed properly, the

338 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–54 Structure of a tRNA-splicing endonuclease docked to aprecursor tRNA. The endonuclease (afour-subunit enzyme) removes the tRNAintron (blue). A second enzyme, amultifunctional tRNA ligase (not shown),then joins the two tRNA halves together.(Courtesy of Hong Li, Christopher Trotta,and John Abelson.)

Figure 6–53 Wobble base-pairing between codons and anticodons.If the nucleotide listed in the first column is present at the third, or wobble,position of the codon, it can base-pair with any of the nucleotides listed inthe second column.Thus, for example, when inosine (I) is present in thewobble position of the tRNA anticodon, the tRNA can recognize any one ofthree different codons in bacteria and either of two codons in eucaryotes.The inosine in tRNAs is formed from the deamination of guanine (see Figure6–55), a chemical modification which takes place after the tRNA has beensynthesized.The nonstandard base pairs, including those made with inosine,are generally weaker than conventional base pairs. Note thatcodon–anticodon base pairing is more stringent at positions 1 and 2 of thecodon: here only conventional base pairs are permitted.The differences inwobble base-pairing interactions between bacteria and eucaryotespresumably result from subtle structural differences between bacterial andeucaryotic ribosomes, the molecular machines that perform proteinsynthesis. (Adapted from C. Guthrie and J.Abelson, in The Molecular Biologyof the Yeast Saccharomyces: Metabolism and Gene Expression, pp. 487–528.Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press, 1982.)

mRNA

tRNA

5¢3¢

wobble position

codon

C G or I

U A, G, or I

wobble codonbase

possibleanticodon bases

anticodon

bacteria

G C or U

A U or I

C G or I

U G or I

wobble codonbase

possibleanticodon bases

eucaryotes

G C

A U

Page 41: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

trimming and splicing reactions are thought to act as quality-control steps in thegeneration of tRNAs.

All tRNAs are also subject to a variety of chemical modifications—nearly onein 10 nucleotides in each mature tRNA molecule is an altered version of a stan-dard G, U, C, or A ribonucleotide. Over 50 different types of tRNA modificationsare known; a few are shown in Figure 6–55. Some of the modified nucleotides—most notably inosine, produced by the deamination of guanosine—affect theconformation and base-pairing of the anticodon and thereby facilitate therecognition of the appropriate mRNA codon by the tRNA molecule (see Figure6–53). Others affect the accuracy with which the tRNA is attached to the correctamino acid.

Specific Enzymes Couple Each Amino Acid to Its Appropriate tRNA Molecule

We have seen that, to read the genetic code in DNA, cells make a series of differ-ent tRNAs. We now consider how each tRNA molecule becomes linked to the oneamino acid in 20 that is its appropriate partner. Recognition and attachment ofthe correct amino acid depends on enzymes called aminoacyl-tRNA syn-thetases, which covalently couple each amino acid to its appropriate set of tRNAmolecules (Figures 6–56 and 6–57). For most cells there is a different synthetaseenzyme for each amino acid (that is, 20 synthetases in all); one attaches glycine

FROM RNA TO PROTEIN 339

Figure 6–55 A few of the unusualnucleotides found in tRNAmolecules. These nucleotides areproduced by covalent modification of anormal nucleotide after it has beenincorporated into an RNA chain. In mosttRNA molecules about 10% of thenucleotides are modified (see Figure 6–52).

ribose

H

HN

N N

N

N

O

CH3

CH3

ribose

H

N

N

O

O

HH

H

H

ribose

H

N

N

S

O

H

H

two methyl groups added to G(N,N-dimethyl G)

two hydrogens added to U(dihydro U)

ribose

H

N

N N

N

H

H

O

deamination of G(inosine)

sulfur replaces oxygen in U(4-thiouridine)

Figure 6–56 Amino acid activation.The two-step process in which an aminoacid (with its side chain denoted by R) isactivated for protein synthesis by anaminoacyl-tRNA synthetase enzyme isshown.As indicated, the energy of ATPhydrolysis is used to attach each aminoacid to its tRNA molecule in a high-energy linkage.The amino acid is firstactivated through the linkage of itscarboxyl group directly to an AMP moiety,forming an adenylated amino acid; thelinkage of the AMP, normally anunfavorable reaction, is driven by thehydrolysis of the ATP molecule thatdonates the AMP. Without leaving thesynthetase enzyme, the AMP-linkedcarboxyl group on the amino acid is thentransferred to a hydroxyl group on thesugar at the 3¢ end of the tRNA molecule.This transfer joins the amino acid by anactivated ester linkage to the tRNA andforms the final aminoacyl-tRNA molecule.The synthetase enzyme is not shown inthis diagram.

R

C

H

CO

OHH2N

P

ATP

AMP

P

2

amino acid

adenylated amino acid

ribose adenine

P ribose adenine

tRNA

OH

aminoacyl-tRNA

R

C

H

CO

H2NR

C

H

CO

OH2N

Page 42: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

to all tRNAs that recognize codons for glycine, another attaches alanine to alltRNAs that recognize codons for alanine, and so on. Many bacteria, however,have fewer than 20 synthetases, and the same synthetase enzyme is responsiblefor coupling more than one amino acid to the appropriate tRNAs. In these cases,a single synthetase places the identical amino acid on two different types oftRNAs, only one of which has an anticodon that matches the amino acid. A sec-ond enzyme then chemically modifies each “incorrectly” attached amino acidso that it now corresponds to the anticodon displayed by its covalently linkedtRNA.

The synthetase-catalyzed reaction that attaches the amino acid to the 3¢ endof the tRNA is one of many cellular reactions coupled to the energy-releasinghydrolysis of ATP (see pp. 83–84), and it produces a high-energy bond betweenthe tRNA and the amino acid. The energy of this bond is used at a later stage inprotein synthesis to link the amino acid covalently to the growing polypeptidechain.

Although the tRNA molecules serve as the final adaptors in convertingnucleotide sequences into amino acid sequences, the aminoacyl-tRNA syn-thetase enzymes are adaptors of equal importance in the decoding process (Fig-ure 6–58). This was established by an ingenious experiment in which an aminoacid (cysteine) was chemically converted into a different amino acid (alanine)after it already had been attached to its specific tRNA. When such “hybrid”aminoacyl-tRNA molecules were used for protein synthesis in a cell-free system,the wrong amino acid was inserted at every point in the protein chain where thattRNA was used. Although cells have several quality control mechanisms to avoidthis type of mishap, the experiment clearly establishes that the genetic code istranslated by two sets of adaptors that act sequentially. Each matches onemolecular surface to another with great specificity, and it is their combined

340 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–58 The genetic code istranslated by means of two adaptorsthat act one after another. The firstadaptor is the aminoacyl-tRNAsynthetase, which couples a particularamino acid to its corresponding tRNA; thesecond adaptor is the tRNA moleculeitself, whose anticodon forms base pairswith the appropriate codon on the mRNA.An error in either step would cause thewrong amino acid to be incorporated intoa protein chain. In the sequence of eventsshown, the amino acid tryptophan (Trp) isselected by the codon UGG on themRNA.

HN

CH

C

CH2

H2N C C

HO

OH

HN

CH

C

CH2

H2N C

H

HN

CH

C

CH2

amino acid(tryptophan)

tRNA synthetase(tryptophanyltRNA synthetase)

linkage of amino acidto tRNA

tRNA binds toits codon in RNA

mRNA

NET RESULT: AMINO ACID IS SELECTED BY ITS CODON

CO

OH2N C

H

CO

O

A C CA C CA C C

ATP AMP + 2Pi

U G G5¢ 3¢

base-pairing

high-energybondtRNA

(tRNATrp)

O OC

CH R

NH2

OO

C

CH R

NH2

O

O_O

O

P

CH2

O

N

N

N

N

HCC

CC

CH

NH2

OH

3¢ 2¢

amino acid

aminoacyl-tRNA

(A) (B) Figure 6–57 The structure of theaminoacyl-tRNA linkage. The carboxylend of the amino acid forms an esterbond to ribose. Because the hydrolysis ofthis ester bond is associated with a largefavorable change in free energy, an aminoacid held in this way is said to beactivated. (A) Schematic drawing of thestructure.The amino acid is linked to thenucleotide at the 3¢ end of the tRNA (seeFigure 6–52). (B) Actual structurecorresponding to boxed region in (A).These are two major classes of synthetaseenzymes: one links the amino acid directlyto the 3¢-OH group of the ribose, and theother links it initially to the 2¢-OH group.In the latter case, a subsequenttransesterification reaction shifts theamino acid to the 3¢ position. As in Figure6–56, the "R-group" indicates the sidechain of the amino acid.

Page 43: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

action that associates each sequence of three nucleotides in the mRNA mole-cule—that is, each codon—with its particular amino acid.

Editing by RNA Synthetases Ensures Accuracy

Several mechanisms working together ensure that the tRNA synthetase links thecorrect amino acid to each tRNA. The synthetase must first select the correctamino acid, and most do so by a two-step mechanism. First, the correct aminoacid has the highest affinity for the active-site pocket of its synthetase and istherefore favored over the other 19. In particular, amino acids larger than thecorrect one are effectively excluded from the active site. However, accurate dis-crimination between two similar amino acids, such as isoleucine and valine(which differ by only a methyl group), is very difficult to achieve by a one-steprecognition mechanism. A second discrimination step occurs after the aminoacid has been covalently linked to AMP (see Figure 6–56). When tRNA binds thesynthetase, it forces the amino acid into a second pocket in the synthetase, theprecise dimensions of which exclude the correct amino acid but allow access byclosely related amino acids. Once an amino acid enters this editing pocket, it ishydrolyzed from the AMP (or from the tRNA itself if the aminoacyl-tRNA bondhas already formed) and released from the enzyme. This hydrolytic editing,which is analogous to the editing by DNA polymerases (Figure 6–59), raises theoverall accuracy of tRNA charging to approximately one mistake in 40,000 cou-plings.

The tRNA synthetase must also recognize the correct set of tRNAs, andextensive structural and chemical complementarity between the synthetase andthe tRNA allows various features of the tRNA to be sensed (Figure 6–60). MosttRNA synthetases directly recognize the matching tRNA anticodon; these syn-thetases contain three adjacent nucleotide-binding pockets, each of which iscomplementary in shape and charge to the nucleotide in the anticodon. Forother synthetases it is the nucleotide sequence of the acceptor stem that is thekey recognition determinant. In most cases, however, nucleotides at severalpositions on the tRNA are “read” by the synthetase.

FROM RNA TO PROTEIN 341

5¢3¢

SYNTHESIZING EDITING

SYNTHESIZING EDITING

tRNA

synthesissite

polymerizationsite

editing site

editing site

incorrectamino acid

incorrectamino acidwill be removed

(A)

(B)

incorrectnucleotidewill be removed

incorrectnucleotideadded

Figure 6–59 Hydrolytic editing.(A) tRNA synthetases remove their owncoupling errors through hydrolytic editingof incorrectly attached amino acids.Asdescribed in the text, the correct aminoacid is rejected by the editing site. (B) Theerror-correction process performed byDNA polymerase shows some similarities;however, it differs so far as the removalprocess depends strongly on a mispairingwith the template (see Figure 5–9).

Figure 6–60 The recognition of atRNA molecule by its aminoacyl-tRNA synthetase. For this tRNA(tRNAGln), specific nucleotides in both theanticodon (bottom) and the amino acid-accepting arm allow the correct tRNA tobe recognized by the synthetase enzyme(blue). (Courtesy of Tom Steitz.)

Page 44: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

342 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

400 nm

Figure 6–62 Ribosomes in thecytoplasm of a eucaryotic cell. Thiselectron micrograph shows a thin sectionof a small region of cytoplasm.Theribosomes appear as black dots (redarrows). Some are free in the cytosol;others are attached to membranes of theendoplasmic reticulum. (Courtesy ofDaniel S. Friend.)

H

H2N

R1

R2

C

O

C CN

H H

C

H

N C

H

C

R3O

O

O

C

H

C

O

O

H2N

R4

34

H

H2N

R1

R2

C

O

C CN

H H

C

H

N C

H

C

R3O

O

C

H

C

O

O

R4

N

H

4

OH

3

peptidyl tRNA attached toC-terminus of the growingpolypeptide chain

aminoacyltRNA

tRNA molecule freed fromits peptidyl linkage

new peptidyl tRNA molecule attached to C-terminus of the growing polypeptide chain

Figure 6–61 The incorporation of anamino acid into a protein. Apolypeptide chain grows by the stepwiseaddition of amino acids to its C-terminalend.The formation of each peptide bondis energetically favorable because thegrowing C-terminus has been activated bythe covalent attachment of a tRNAmolecule.The peptidyl-tRNA linkage thatactivates the growing end is regeneratedduring each addition.The amino acid sidechains have been abbreviated as R1, R2, R3,and R4; as a reference point, all of theatoms in the second amino acid in thepolypeptide chain are shaded gray.Thefigure shows the addition of the fourthamino acid to the growing chain.

Amino Acids Are Added to the C-terminal Endof a Growing Polypeptide Chain

Having seen that amino acids are first coupled to tRNA molecules, we now turnto the mechanism by which they are joined together to form proteins. The fun-damental reaction of protein synthesis is the formation of a peptide bondbetween the carboxyl group at the end of a growing polypeptide chain and a freeamino group on an incoming amino acid. Consequently, a protein is synthesizedstepwise from its N-terminal end to its C-terminal end. Throughout the entireprocess the growing carboxyl end of the polypeptide chain remains activated byits covalent attachment to a tRNA molecule (a peptidyl-tRNA molecule). Thishigh-energy covalent linkage is disrupted during each addition but is immedi-ately replaced by the identical linkage on the most recently added amino acid(Figure 6–61). In this way, each amino acid added carries with it the activationenergy for the addition of the next amino acid rather than the energy for its ownaddition—an example of the “head growth” type of polymerization described inFigure 2–68.

The RNA Message Is Decoded on Ribosomes

As we have seen, the synthesis of proteins is guided by information carried bymRNA molecules. To maintain the correct reading frame and to ensure accuracy(about 1 mistake every 10,000 amino acids), protein synthesis is performed inthe ribosome, a complex catalytic machine made from more than 50 differentproteins (the ribosomal proteins) and several RNA molecules, the ribosomalRNAs (rRNAs). A typical eucaryotic cell contains millions of ribosomes in itscytoplasm (Figure 6–62). As we have seen, eucaryotic ribosomal subunits are

Page 45: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

assembled at the nucleolus, by the association of newly transcribed and modi-fied rRNAs with ribosomal proteins, which have been transported into thenucleus after their synthesis in the cytoplasm. The two ribosomal subunits arethen exported to the cytoplasm, where they perform protein synthesis.

Eucaryotic and procaryotic ribosomes are very similar in design and func-tion. Both are composed of one large and one small subunit that fit together toform a complete ribosome with a mass of several million daltons (Figure 6–63).The small subunit provides a framework on which the tRNAs can be accuratelymatched to the codons of the mRNA (see Figure 6–58), while the large subunitcatalyzes the formation of the peptide bonds that link the amino acids togetherinto a polypeptide chain (see Figure 6–61).

When not actively synthesizing proteins, the two subunits of the ribosomeare separate. They join together on an mRNA molecule, usually near its 5¢ end,to initiate the synthesis of a protein. The mRNA is then pulled through the ribo-some; as its codons encounter the ribosome’s active site, the mRNA nucleotidesequence is translated into an amino acid sequence using the tRNAs as adaptorsto add each amino acid in the correct sequence to the end of the growingpolypeptide chain. When a stop codon is encountered, the ribosome releasesthe finished protein, its two subunits separate again. These subunits can then beused to start the synthesis of another protein on another mRNA molecule.

Ribosomes operate with remarkable efficiency: in one second, a single ribo-some of a eucaryotic cell adds about 2 amino acids to a polypeptide chain; theribosomes of bacterial cells operate even faster, at a rate of about 20 amino acids

FROM RNA TO PROTEIN 343

Figure 6–63 A comparison of the structures of procaryotic and eucaryotic ribosomes. Ribosomalcomponents are commonly designated by their “S values,” which refer to their rate of sedimentation in anultracentrifuge. Despite the differences in the number and size of their rRNA and protein components, bothprocaryotic and eucaryotic ribosomes have nearly the same structure and they function similarly.Althoughthe 18S and 28S rRNAs of the eucaryotic ribosome contain many extra nucleotides not present in theirbacterial counterparts, these nucleotides are present as multiple insertions that form extra domains and leavethe basic structure of each rRNA largely unchanged.

70S

MW 2,500,000

50S 30S

MW 1,600,000 MW 900,000

5S rRNA 23S rRNA

120nucleotides 2900

nucleotides1540

nucleotides

16S rRNA

34 proteins 21 proteins

PROCARYOTIC RIBOSOME

80S

MW 4,200,000

60S 40S

MW 2,800,000 MW 1,400,000

5S rRNA 28S rRNA 5.8S rRNA 18S rRNA

120nucleotides

4700nucleotides

160nucleotides 1900

nucleotides

~49 proteins ~33 proteins

EUCARYOTIC RIBOSOME

Page 46: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

90˚

344 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

E P A

E-site P-site A-site

mRNA-binding site

largeribosomalsubunit

smallribosomalsubunit

(D)

(A) (B)

(C)

Figure 6–64 The RNA-binding sitesin the ribosome. Each ribosome hasthree binding sites for tRNA: the A-, P-,and E-sites (short for aminoacyl-tRNA,peptidyl-tRNA, and exit, respectively) andone binding site for mRNA. (A) Structureof a bacterial ribosome with the smallsubunit in the front (dark green) and thelarge subunit in the back (light green). Boththe rRNAs and the ribosomal proteins areshown. tRNAs are shown bound in the E-site (red), the P-site (orange) and the A-site(yellow). Although all three tRNA sites areshown occupied here, during the processof protein synthesis not more than two ofthese sites are thought to contain tRNAmolecules at any one time (see Figure6–65). (B) Structure of the large (left) andsmall (right) ribosomal subunits arrangedas though the ribosome in (A) wereopened like a book. (C) Structure of theribosome in (A) viewed from the top. (D)Highly schematic representation of aribosome (in the same orientation as C),which will be used in subsequent figures.(A, B, and C, adapted from M.M.Yusupov et al., Science 292:883–896, 2001, courtesyof Albion Bausom and Harry Noller.)

per second. How does the ribosome choreograph the many coordinated move-ments required for efficient translation? A ribosome contains four binding sitesfor RNA molecules: one is for the mRNA and three (called the A-site, the P-site,and the E-site) are for tRNAs (Figure 6–64). A tRNA molecule is held tightly at theA- and P-sites only if its anticodon forms base pairs with a complementarycodon (allowing for wobble) on the mRNA molecule that is bound to the ribo-some. The A- and P-sites are close enough together for their two tRNA moleculesto be forced to form base pairs with adjacent codons on the mRNA molecule.This feature of the ribosome maintains the correct reading frame on the mRNA.

Once protein synthesis has been initiated, each new amino acid is added tothe elongating chain in a cycle of reactions containing three major steps. Ourdescription of the chain elongation process begins at a point at which someamino acids have already been linked together and there is a tRNA molecule inthe P-site on the ribosome, covalently joined to the end of the growing polypep-tide (Figure 6–65). In step 1, a tRNA carrying the next amino acid in the chainbinds to the ribosomal A-site by forming base pairs with the codon in mRNApositioned there, so that the P-site and the A-site contain adjacent bound tRNAs.In step 2, the carboxyl end of the polypeptide chain is released from the tRNA atthe P-site (by breakage of the high-energy bond between the tRNA and its aminoacid) and joined to the free amino group of the amino acid linked to the tRNA atthe A-site, forming a new peptide bond. This central reaction of protein synthe-sis is catalyzed by a peptidyl transferase catalytic activity contained in the largeribosomal subunit. This reaction is accompanied by several conformationalchanges in the ribosome, which shift the two tRNAs into the E- and P-sites of thelarge subunit. In step 3, another series of conformational changes moves themRNA exactly three nucleotides through the ribosome and resets the ribosomeso it is ready to receive the next amino acyl tRNA. Step 1 is then repeated with anew incoming aminoacyl tRNA, and so on.

This three-step cycle is repeated each time an amino acid is added to thepolypeptide chain, and the chain grows from its amino to its carboxyl end untila stop codon is encountered.

Page 47: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Elongation Factors Drive Translation Forward

The basic cycle of polypeptide elongation shown in outline in Figure 6–65 has anadditional feature that makes translation especially efficient and accurate. Twoelongation factors (EF-Tu and EF-G) enter and leave the ribosome during eachcycle, each hydrolyzing GTP to GDP and undergoing conformational changes inthe process. Under some conditions, ribosomes can be made to perform proteinsynthesis without the aid of the elongation factors and GTP hydrolysis, but thissynthesis is very slow, inefficient, and inaccurate. The process is speeded upenormously by coupling conformational changes in the elongation factors totransitions between different conformational states of the ribosome. Althoughthese conformational changes in the ribosome are not yet understood in detail,some may involve RNA rearrangements similar to those occurring in the RNAsof the spliceosome (see Figure 6–30). The cycles of elongation factor association,GTP hydrolysis, and dissociation ensures that the conformational changes occurin the “forward” direction and translation thereby proceeds efficiently (Figure6–66).

In addition to helping move translation forward, EF-Tu is thought toincrease the accuracy of translation by monitoring the initial interactionbetween a charged tRNA and a codon (see Figure 6–66). Charged tRNAs enterthe ribosome bound to the GTP-form of EF-Tu. Although the bound elongationfactor allows codon–anticodon pairing to occur, it prevents the amino acid frombeing incorporated into the growing polypeptide chain. The initial codon recog-nition, however, triggers the elongation factor to hydrolyze its bound GTP (toGDP and inorganic phosphate), whereupon the factor dissociates from the ribo-some without its tRNA, allowing protein synthesis to proceed. The elongationfactor introduces two short delays between codon–anticodon base pairing andpolypeptide chain elongation; these delays selectively permit incorrectly boundtRNAs to exit from the ribosome before the irreversible step of chain elongationoccurs. The first delay is the time required for GTP hydrolysis. The rate of GTPhydrolysis by EF-Tu is faster for a correct codon–anticodon pair than for anincorrect pair; hence an incorrectly bound tRNA molecule has a longer windowof opportunity to dissociate from the ribosome. In other words, GTP hydrolysisselectively captures the correctly bound tRNAs. A second lag occurs between EF-Tu dissociation and the full accommodation of the tRNA in the A site of the ribo-some. Although this lag is believed to be the same for correctly and incorrectlybound tRNAs, an incorrect tRNA molecule forms a smaller number ofcodon–anticodon hydrogen bonds than does a correctly matched pair and istherefore more likely to dissociate during this period. These two delays intro-duced by the elongation factor cause most incorrectly bound tRNA molecules(as well as a significant number of correctly bound molecules) to leave the

FROM RNA TO PROTEIN 345

Figure 6–65 Translating an mRNA molecule. Each amino acid added tothe growing end of a polypeptide chain is selected by complementary base-pairing between the anticodon on its attached tRNA molecule and the nextcodon on the mRNA chain. Because only one of the many types of tRNAmolecules in a cell can base-pair with each codon, the codon determines thespecific amino acid to be added to the growing polypeptide chain.The three-stepcycle shown is repeated over and over during the synthesis of a protein.Anaminoacyl-tRNA molecule binds to a vacant A-site on the ribosome in step 1, anew peptide bond is formed in step 2, and the mRNA moves a distance of threenucleotides through the small-subunit chain in step 3, ejecting the spent tRNAmolecule and “resetting” the ribosome so that the next aminoacyl-tRNAmolecule can bind. Although the figure shows a large movement of the smallribosome subunit relative to the large subunit, the conformational changes thatactually take place in the ribosome during translation are more subtle. It is likelythat they involve a series of small rearrangements within each subunit as well asseveral small shifts between the two subunits. As indicated, the mRNA istranslated in the 5¢-to-3¢ direction, and the N-terminal end of a protein is madefirst, with each cycle adding one amino acid to the C-terminus of the polypeptidechain.The position at which the growing peptide chain is attached to a tRNAdoes not change during the elongation cycle: it is always linked to the tRNApresent in the P site of the large subunit.

incomingaminoacyl-

tRNA

A

E P A

12

3H2N

H2N

H2N

H2N

H2N

4

growing polypeptide chain

3 4

A

E P

1

2 3

4

3 4

AE P

1

2 3

4

4

AE P

1

2 3

4

4 5

5

AP

1

2 3 4

4 5

5

STEP 1

STEP 2

STEP 3

STEP 1

STEP 2

5¢ 3¢

5¢ 3¢

5¢ 3¢

5¢ 3¢

5¢ 3¢

3

Page 48: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

ribosome without being used for protein synthesis, and this two-step mechanismis largely responsible for the 99.99% accuracy of the ribosome in translatingproteins.

Recent discoveries indicate that EF-Tu may have an additional role in rais-ing the overall accuracy of translation. Earlier in this chapter, we discussed thekey role of aminoacyl synthetases in accurately matching amino acids to tRNAs.As the GTP-bound form of EF-Tu escorts aminoacyl-tRNAs to the ribosome (seeFigure 6–66), it apparently double-checks for the proper correspondencebetween amino acid and tRNA and rejects those that are mismatched. Exactlyhow this is accomplished is not well-understood, but it may involve the overallbinding energy between EF-Tu and the aminoacyl-tRNA. According to this idea,correct matches have a narrowly defined affinity for EF-Tu, and incorrectmatches bind either too strongly or too weakly. EF-Tu thus appears to discrimi-nate, albeit crudely, among many different amino acid-tRNA combinations,selectively allowing only the correct ones to enter the ribosome.

The Ribosome Is a Ribozyme

The ribosome is a very large and complex structure, composed of two-thirdsRNA and one-third protein. The determination, in 2000, of the entire three-dimensional structure of its large and small subunits is a major triumph of mod-ern structural biology. The structure strongly confirms the earlier evidence thatrRNAs—and not proteins—are responsible for the ribosome’s overall structure,its ability to position tRNAs on the mRNA, and its catalytic activity in formingcovalent peptide bonds. Thus, for example, the ribosomal RNAs are folded into

346 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–66 Detailed view of the translation cycle. The outline oftranslation presented in Figure 6–65 has been supplemented with additionalfeatures, including the participation of elongation factors and a mechanism bywhich translational accuracy is improved. In the initial binding event (toppanel) an aminoacyl-tRNA molecule that is tightly bound to EF-Tu pairstransiently with the codon at the A-site in the small subunit. During this step(second panel), the tRNA occupies a hybrid-binding site on the ribosome.Thecodon–anticodon pairing triggers GTP hydrolysis by EF-Tu causing it todissociate from the aminoacyl-tRNA, which now enters the A-site (fourthpanel) and can participate in chain elongation. A delay between aminoacyl-tRNA binding and its availability for protein synthesis is thereby inserted intothe protein synthesis mechanism.As described in the text, this delayincreases the accuracy of translation. In subsequent steps, elongation factorEF-G in the GTP-bound form enters the ribosome and binds in or near theA-site on the large ribosomal subunit, accelerating the movement of the twobound tRNAs into the A/P and P/E hybrid states. Contact with the ribosomestimulates the GTPase activity of EF-G, causing a dramatic conformationalchange in EF-G as it switches from the GTP to the GDP-bound form.Thischange moves the tRNA bound to the A/P hybrid state to the P-site andadvances the cycle of translation forward by one codon.

During each cycle of translation elongation, the tRNAs molecules movethrough the ribosome in an elaborate series of gyrations during which theytransiently occupy several “hybrid” binding states. In one, the tRNA issimultaneously bound to the A site of the small subunit and the P site of thelarge subunit; in another, the tRNA is bound to the P site of the smallsubunit and the E site of the large subunit. In a single cycle, a tRNA moleculeis considered to occupy six different sites, the initial binding site (called theA/T hybrid state), the A/A site, the A/P hybrid state, the P/P site, the P/Ehybrid state, and the E-site. Each tRNA is thought to ratchet through thesepositions, undergoing rotations along its long axis at each change in location.

EF-Tu and EF-G are the designations used for the bacterial elongationfactors; in eucaryotes, they are called EF-1 and EF-2, respectively.Thedramatic change in the three-dimensional structure of EF-Tu that is causedby GTP hydrolysis was illustrated in Figure 3–74. For each peptide bondformed, a molecule of EF-Tu and EF-G are each released in their inactive,GDP-bound forms.To be used again, these proteins must have their GDPexchanged for GTP. In the case of EF-Tu, this exchange is performed by aspecific member of a large class of proteins known as GTP exchange factors.

E P A

mRNA

E P A

E P A

E P A

E

EF-Tu

incorrectly base-paired tRNAspreferentiallydissociate

incorrectly base-paired tRNAspreferentiallydissociate

E P A

EF-G

GTP

GTP

GTP

GDP

GDP

GDP

Pi

Pi

GTP

Page 49: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

highly compact, precise three-dimensional structures that form the compactcore of the ribosome and thereby determine its overall shape (Figure 6–67).

In marked contrast to the central positions of the rRNA, the ribosomal pro-teins are generally located on the surface and fill in the gaps and crevices of thefolded RNA (Figure 6–68). Some of these proteins contain globular domains onthe ribosome surface that send out extended regions of polypeptide chain thatpenetrate short distances into holes in the RNA core (Figure 6–69). The main role

FROM RNA TO PROTEIN 347

(B) (C)(A)

domain II

5S rRNA

L1

domain V

domain I

domain III

domain IV

domain VI

(A)

(B)

domain II

domain IIIdomain IV

domain V

domain VI

domain I

Figure 6–67 Structure of the rRNAs in the large subunit of a bacterial ribosome, as determinedby x-ray crystallography. (A) Three-dimensional structures of the large-subunit rRNAs (5S and 23S) as theyappear in the ribosome. One of the protein subunits of the ribosome (L1) is also shown as a reference point,since it forms a characteristic protrusion on the ribosome. (B) Schematic diagram of the secondary structureof the 23S rRNA showing the extensive network of base-pairing.The structure has been divided into sixstructural ‘domains’ whose colors correspond to those of the three-dimensional structure in (A).Thesecondary-structure diagram is highly schematized to represent as much of the structure as possible in twodimensions.To do this, several discontinuities in the RNA chain have been introduced, although in reality the23S RNA is a single RNA molecule. For example, the base of Domain III is continuous with the base ofDomain IV even though a gap appears in the diagram. (Adapted from N. Ban et al., Science 289:905–920, 2000.)

Figure 6–68 Location of the protein components of the bacterial large ribosomal subunit. The rRNAs (5S and 23S) are depicted ingray and the large-subunit proteins (27 of the 31 total) in gold. For convenience, the protein structures depict only the polypeptide backbones.(A) View of the interface with the small subunit, the same view shown in Figure 6–64B. (B) View of the back of the large subunit, obtained byrotating (A) by 180° around a vertical axis. (C) View of the bottom of the large subunit showing the peptide exit channel in the center of thestructure. (From N. Ban et al., Science 289:905–920, 2000. © AAAS.)

Page 50: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

of the ribosomal proteins seems to be to stabilize the RNA core, while permittingthe changes in rRNA conformation that are necessary for this RNA to catalyzeefficient protein synthesis.

Not only are the three binding sites for tRNAs (the A-, P-, and E-sites) on theribosome formed primarily by the ribosomal RNAs, but the catalytic site for pep-tide bond formation is clearly formed by the 23S RNA, with the nearest aminoacid located more than 1.8 nm away. This RNA-based catalytic site for peptidyltransferase is similar in many respects to those found in some proteins; it is ahighly structured pocket that precisely orients the two reactants (the growingpeptide chain and an aminoacyl-tRNA), and it provides a functional group to actas a general acid–base catalyst—in this case apparently, a ring nitrogen of ade-nine, instead of an amino acid side chain such as histidine (Figure 6–70). Theability of an RNA molecule to act as such a catalyst was initially surprisingbecause RNA was thought to lack an appropriate chemical group that couldboth accept and donate a proton. Although the pK of adenine-ring nitrogens isusually around 3.5, the three-dimensional structure and charge distribution ofthe 23S rRNA active site force the pK of this apparently critical adenine into theneutral range and thereby create the enzymatic activity.

RNA molecules that possess catalytic activity are known as ribozymes. Wesaw earlier in this chapter how other ribozymes function in RNA-splicing reac-tions (for example, see Figure 6–36). In the final section of this chapter, weconsider what the recently recognized ability of RNA molecules to function ascatalysts for a wide variety of different reactions might mean for the early evolu-tion of living cells. Here we need only note that there is good reason to suspectthat RNA rather than protein molecules served as the first catalysts for livingcells. If so, the ribosome, with its RNA core, might be viewed as a relic of an ear-lier time in life’s history—when protein synthesis evolved in cells that were runalmost entirely by ribozymes.

Nucleotide Sequences in mRNA Signal Where to Start Protein Synthesis

The initiation and termination of translation occur through variations on thetranslation elongation cycle described above. The site at which protein synthe-sis begins on the mRNA is especially crucial, since it sets the reading frame forthe whole length of the message. An error of one nucleotide either way at thisstage would cause every subsequent codon in the message to be misread, so thata nonfunctional protein with a garbled sequence of amino acids would result.The initiation step is also of great importance in another respect, since for mostgenes it is the last point at which the cell can decide whether the mRNA is to betranslated and the protein synthesized; the rate of initiation thus determines therate at which the protein is synthesized. We shall see in Chapter 7 that cells useseveral mechanisms to regulate translation initiation.

The translation of an mRNA begins with the codon AUG, and a special tRNAis required to initiate translation. This initiator tRNA always carries the aminoacid methionine (in bacteria, a modified form of methionine—formylmethion-ine—is used) so that all newly made proteins have methionine as the first aminoacid at their N-terminal end, the end of a protein that is synthesized first. This

348 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–69 Structure of the L15protein in the large subunit of thebacterial ribosome. The globulardomain of the protein lies on the surfaceof the ribosome and an extended regionpenetrates deeply into the RNA core ofthe ribosome.The L15 protein is shown inyellow and a portion of the ribosomalRNA core is shown in red. (Courtesy ofD. Klein, P.B. Moore and T.A. Steitz.)

Figure 6–70 A possible reactionmechanism for the peptidyltransferase activity present in thelarge ribosomal subunit. The overallreaction is catalyzed by an active site inthe 23S rRNA. In the first step of theproposed mechanism, the N3 of theactive-site adenine abstracts a protonfrom the amino acid attached to thetRNA at the ribosome’s A-site, allowing itsamino nitrogen to attack the carboxylgroup at the end of the growing peptidechain. In the next step this protonatedadenine donates its hydrogen to theoxygen linked to the peptidyl-tRNA,causing this tRNA’s release from thepeptide chain.This leaves a polypeptidechain that is one amino acid longer thanthe starting reactants.The entire reactioncycle would then repeat with the nextaminoacyl tRNA that enters the A-site.(Adapted from P. Nissen et al., Science289:920–930, 2000.)

N

N

N

N NH2

N

R

O-tRNA

tRNA-O

tRNA-O

tRNA-O

Onascentprotein

nascentprotein

nascentprotein

O

H

H

N

N

N

N NH2

NH

R

O-tRNA

O

NH

R

O-tRNA

O

OO

H H

P siteA site

+N

N

N

N NH2

chargedtRNA

adenine atactive site ofribosome

Page 51: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

methionine is usually removed later by a specific protease. The initiator tRNAhas a nucleotide sequence distinct from that of the tRNA that normally carriesmethionine.

In eucaryotes, the initiator tRNA (which is coupled to methionine) is firstloaded into the small ribosomal subunit along with additional proteins calledeucaryotic initiation factors, or eIFs (Figure 6–71). Of all the aminoacyl tRNAsin the cell, only the methionine-charged initiator tRNA is capable of tightlybinding the small ribosome subunit without the complete ribosome present.Next, the small ribosomal subunit binds to the 5¢ end of an mRNA molecule,which is recognized by virtue of its 5¢ cap and its two bound initiation factors,eIF4E (which directly binds the cap) and eIF4G (see Figure 6–40). The small ribo-somal subunit then moves forward (5¢ to 3¢) along the mRNA, searching for thefirst AUG. This movement is facilitated by additional initiation factors that act asATP-powered helicases, allowing the small subunit to scan through RNA sec-ondary structure. In 90% of mRNAs, translation begins at the first AUG encoun-tered by the small subunit. At this point, the initiation factors dissociate from thesmall ribosomal subunit to make way for the large ribosomal subunit to assem-ble with it and complete the ribosome. The initiator tRNA is now bound to theP-site, leaving the A-site vacant. Protein synthesis is therefore ready to beginwith the addition of the next aminoacyl tRNA molecule (see Figure 6–71).

The nucleotides immediately surrounding the start site in eucaryoticmRNAs influence the efficiency of AUG recognition during the above scanningprocess. If this recognition site is quite different from the consensus recognitionsequence, scanning ribosomal subunits will sometimes ignore the first AUGcodon in the mRNA and skip to the second or third AUG codon instead. Cells fre-quently use this phenomenon, known as “leaky scanning,” to produce two ormore proteins, differing in their N-termini, from the same mRNA molecule. Itallows some genes to produce the same protein with and without a signalsequence attached at its N-terminus, for example, so that the protein is directedto two different compartments in the cell.

The mechanism for selecting a start codon in bacteria is different. BacterialmRNAs have no 5¢ caps to tell the ribosome where to begin searching for thestart of translation. Instead, each bacterial mRNA contains a specific ribosome-binding site (called the Shine–Dalgarno sequence, named after its discoverers)that is located a few nucleotides upstream of the AUG at which translation is tobegin. This nucleotide sequence, with the consensus 5¢-AGGAGGU-3¢, formsbase pairs with the 16S rRNA of the small ribosomal subunit to position the ini-tiating AUG codon in the ribosome. A set of translation initiation factors orches-trates this interaction, as well as the subsequent assembly of the large ribosomalsubunit to complete the ribosome.

Unlike a eucaryotic ribosome, a bacterial ribosome can therefore readilyassemble directly on a start codon that lies in the interior of an mRNA molecule,so long as a ribosome-binding site precedes it by several nucleotides. As a result,bacterial mRNAs are often polycistronic—that is, they encode several differentproteins, each of which is translated from the same mRNA molecule (Figure6–72). In contrast, a eucaryotic mRNA generally encodes only a single protein.

Stop Codons Mark the End of Translation

The end of the protein-coding message is signaled by the presence of one ofthree codons (UAA, UAG, or UGA) called stop codons (see Figure 6–50). These arenot recognized by a tRNA and do not specify an amino acid, but instead signal

FROM RNA TO PROTEIN 349

Figure 6–71 The initiation phase of protein synthesis in eucaryotes.Only three of the many translation initiation factors required for thisprocess are shown. Efficient translation initiation also requires the poly-A tailof the mRNA bound by poly-A-binding proteins which, in turn, interact witheIF4G. In this way, the translation apparatus ascertains that both ends of themRNA are intact before initiating (see Figure 6–40).Although only one GTPhydrolysis event is shown in the figure, a second is known to occur justbefore the large and small ribosomal subunits join.

A

E P A

Met

E P A

Met

Met

Met

5¢ 3¢

5¢ 3¢

5¢ 3¢

5¢ 3¢

mRNA

AUG

aa

aa

initiator tRNA

eIF-2

small ribosomal subunit with initiator tRNA bound

INITIATOR tRNAMOVES ALONG RNA SEARCHING FOR FIRST AUG

eIF-2 AND OTHERINITIATION FACTORS DISSOCIATE

LARGERIBOSOMALSUBUNITBINDS

AMINOACYL-tRNA BINDS(step 1)

FIRST PEPTIDEBOND FORMS (step 2)

etc.

AUG

AUG

AUG

EP

A

Met

GTP

GTP

GTP

GDP

eIF-4G

eIF-4E 5¢ cap

additionalinitiation factors

E P

Met

5¢ 3¢

aa

AUG

ATP

ADP

mRNA

Pi +

Pi +

AAAAAAAA

Page 52: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

to the ribosome to stop translation. Proteins known as release factors bind to anyribosome with a stop codon positioned in the A site, and this binding forces thepeptidyl transferase in the ribosome to catalyze the addition of a water moleculeinstead of an amino acid to the peptidyl-tRNA (Figure 6–73). This reaction freesthe carboxyl end of the growing polypeptide chain from its attachment to a tRNAmolecule, and since only this attachment normally holds the growing polypep-tide to the ribosome, the completed protein chain is immediately released intothe cytoplasm. The ribosome then releases the mRNA and separates into thelarge and small subunits, which can assemble on another mRNA molecule tobegin a new round of protein synthesis.

Release factors provide a dramatic example of molecular mimicry, wherebyone type of macromolecule resembles the shape of a chemically unrelatedmolecule. In this case, the three-dimensional structure of release factors (madeentirely of protein) bears an uncanny resemblance to the shape and charge dis-tribution of a tRNA molecule (Figure 6–74). This shape and charge mimicryallows the release factor to enter the A-site on the ribosome and cause transla-tion termination.

During translation, the nascent polypeptide moves through a large, water-filled tunnel (approximately 10 nm ¥ 1.5 nm) in the large subunit of the ribo-some (see Figure 6–68C). The walls of this tunnel, made primarily of 23S rRNA,are a patchwork of tiny hydrophobic surfaces embedded in a more extensivehydrophilic surface. This structure, because it is not complementary to any pep-tide structure, provides a “Teflon” coating through which a polypeptide chaincan easily slide. The dimensions of the tunnel suggest that nascent proteins arelargely unstructured as they pass through the ribosome, although some a-heli-cal regions of the protein can form before leaving the ribosome tunnel. As itleaves the ribosome, a newly-synthesized protein must fold into its properthree-dimensional structure to be useful to the cell, and later in this chapter wediscuss how this folding occurs. First, however, we review several additionalaspects of the translation process itself.

Proteins Are Made on Polyribosomes

The synthesis of most protein molecules takes between 20 seconds and severalminutes. But even during this very short period, multiple initiations usually takeplace on each mRNA molecule being translated. As soon as the preceding ribo-some has translated enough of the nucleotide sequence to move out of the way,the 5¢ end of the mRNA is threaded into a new ribosome. The mRNA moleculesbeing translated are therefore usually found in the form of polyribosomes (alsoknown as polysomes), large cytoplasmic assemblies made up of several ribo-somes spaced as close as 80 nucleotides apart along a single mRNA molecule(Figure 6–75). These multiple initiations mean that many more protein

350 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–72 Structure of a typical bacterial mRNA molecule. Unlikeeucaryotic ribosomes, which typically require a capped 5¢ end, procaryoticribosomes initiate transcription at ribosome-binding sites (Shine–Dalgarnosequences), which can be located anywhere along an mRNA molecule.Thisproperty of ribosomes permits bacteria to synthesize more than one type ofprotein from a single mRNA molecule.

PPP

ribosome-binding sites5¢ 3¢

protein α protein β protein γ

mRNAAUG AUG AUG

E P A

E P A

E P A

5¢ 3¢

AUGAACUGGUAGCGAUCGACC

H2N

Met

AsnTrp

5¢ 3¢

AUGAACUGGUAGCGAUCGACC

H2N

Met

AsnTrp

5¢ 3¢

AUGAACUGGUAGCGAUCGACC

NH2

Met

Asn

TrpCOOH

5¢ 3¢

AUGAACUGGUAGCGAUCG

ACC

TERMINATION

BINDING OFRELEASEFACTORTO THEA-SITE

H2O

EP

A

Figure 6–73 The final phase of protein synthesis. The binding of arelease factor to an A-site bearing a stop codon terminates translation.Thecompleted polypeptide is released and, after the action of a ribosome recyclingfactor (not shown), the ribosome dissociates into its two separate subunits.

Page 53: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

molecules can be made in a given time than would be possible if each had to becompleted before the next could start.

Both bacteria and eucaryotes utilize polysomes, and both employ additionalstrategies to speed up the rate of protein synthesis even further. Because bacte-rial mRNA does not need to be processed and is accessible to ribosomes while itis being made, ribosomes can attach to the free end of a bacterial mRNAmolecule and start translating it even before the transcription of that RNA iscomplete, following closely behind the RNA polymerase as it moves along DNA.In eucaryotes, as we have seen, the 5¢ and 3¢ ends of the mRNA interact (see Fig-ures 6–40 and 6–75A); therefore, as soon as a ribosome dissociates, its two sub-units are in an optimal position to reinitiate translation on the same mRNAmolecule.

Quality-Control Mechanisms Operate at Many Stages of Translation

Translation by the ribosome is a compromise between the opposing constraintsof accuracy and speed. We have seen, for example, that the accuracy of transla-tion (1 mistake per 104 amino acids joined) requires a time delay each time anew amino acid is added to a growing polypeptide chain, producing an overall

FROM RNA TO PROTEIN 351

stop codon

UAG AUG

startcodon

messenger RNA(mRNA)

100 nm

polypeptidechain

(A)

AAAAAAAAAeIF-4G

eIF-4E

5¢ cap

poly-A-bindingprotein

(B)

100 nm

Figure 6–75 A polyribosome.(A) Schematic drawing showing how aseries of ribosomes can simultaneouslytranslate the same eucaryotic mRNAmolecule. (B) Electron micrograph of apolyribosome from a eucaryotic cell.(B, courtesy of John Heuser.)

Figure 6–74 The structure of ahuman translation release factor(eRF1) and its resemblance to atRNA molecule. The protein is on theleft and the tRNA on the right. (From H. Song et al., Cell 100:311–321, 2000.© Elsevier.)

Page 54: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

speed of translation of 20 amino acids incorporated per second in bacteria.Mutant bacteria with a specific alteration in their small ribosomal subunit trans-late mRNA into protein with an accuracy considerably higher than this; however,protein synthesis is so slow in these mutants that the bacteria are barely able tosurvive.

We have also seen that attaining the observed accuracy of protein synthesisrequires the expenditure of a great deal of free energy; this is expected, since, asdiscussed in Chapter 2, a price must be paid for any increase in order in the cell.In most cells, protein synthesis consumes more energy than any other biosyn-thetic process. At least four high-energy phosphate bonds are split to make eachnew peptide bond: two are consumed in charging a tRNA molecule with anamino acid (see Figure 6–56), and two more drive steps in the cycle of reactionsoccurring on the ribosome during synthesis itself (see Figure 6–66). In addition,extra energy is consumed each time that an incorrect amino acid linkage ishydrolyzed by a tRNA synthetase (see Figure 6–59) and each time that an incor-rect tRNA enters the ribosome, triggers GTP hydrolysis, and is rejected (Figure6–66). To be effective, these proofreading mechanisms must also remove anappreciable fraction of correct interactions; for this reason they are even morecostly in energy than they might seem.

Other quality control mechanisms ensure that a eucaryotic mRNA moleculeis complete before ribosomes even begin to translate it. Translating broken orpartly processed mRNAs would be harmful to the cell, because truncated or oth-erwise aberrant proteins would be produced. In eucaryotes, we have seen thatmRNA production involves not only transcription but also a series of elaborateRNA-processing steps; these take place in the nucleus, segregated from ribo-somes, and only when the processing is complete are the mRNAs transported tothe cytoplasm to be translated (see Figure 6–40). An mRNA molecule that wasintact when it left the nucleus can, however, become broken in the cytosol. Toavoid translating such broken mRNA molecules, the 5¢ cap and the poly-A tailare both recognized by the translation-initiation apparatus before translationbegins (see Figures 6–71 and 6–75).

Bacteria solve the problem of incomplete mRNAs in an entirely differentway. Not only are there no signals at the 3¢ ends of bacterial mRNAs, but also, aswe have seen, translation often begins before the synthesis of the transcript hasbeen completed. When the bacterial ribosome translates to the end of an incom-plete RNA, a special RNA (called tmRNA) enters the A-site of the ribosome andis itself translated; this adds a special 11 amino acid tag to the C terminus of thetruncated protein that signals to proteases that the entire protein is to bedegraded (Figure 6–76).

There Are Minor Variations in the Standard Genetic Code

As discussed in Chapter 1, the genetic code (shown in Figure 6–50) applies to allthree major branches of life, providing important evidence for the commonancestry of all life on Earth. Although rare, there are exceptions to this code, andwe discuss some of them in this section. For example, Candida albicans, themost prevalent human fungal pathogen, translates the codon CUG as serine,whereas nearly all other organisms translate it as leucine. Mitochondria (whichhave their own genomes and encode much of their translational apparatus) alsoshow several deviations from the standard code. For example, in mammalianmitochondria AUA is translated as methionine, whereas in the cytosol of the cellit is translated as isoleucine (see Table 14–3, p. 814).

352 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

E P A

H2N

Ala

tmRNA

ribosome stalled onbroken mRNA

E P A

Ala

E P A

Ala

broken RNArejected

E P A

Ala Ala

elongation resumesusing codons of tmRNA

H2N

incomplete protein 11-aminoacid tag

degraded by tag-specific proteases

Figure 6–76 The rescue of a bacterial ribosome stalled on anincomplete mRNA molecule. The tmRNA shown is a 363-nucleotideRNA with both tRNA and mRNA functions, hence its name. It carries analanine and can enter the vacant A-site of a stalled ribosome to add thisalanine to a polypeptide chain, mimicking a tRNA except that no codon ispresent to guide it.The ribosome then translates ten codons from thetmRNA, completing an 11-amino acid tag on the protein.This tag isrecognized by proteases that then degrade the entire protein.

Page 55: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

The type of deviation in the genetic code discussed above is “hardwired”into the organisms or the organelles in which it occurs. A different type of varia-tion, sometimes called translational recoding, occurs in many cells. In this case,other nucleotide sequence information present in an mRNA can change themeaning of the genetic code at a particular site in the mRNA molecule. The stan-dard code allows cells to manufacture proteins using only 20 amino acids. How-ever, bacteria, archaea, and eucaryotes have available to them a twenty-firstamino acid that can be incorporated directly into a growing polypeptide chainthrough translational recoding. Selenocysteine, which is essential for the effi-cient function of a variety of enzymes, contains a selenium atom in place of thesulfur atom of cysteine. Selenocysteine is produced from a serine attached to aspecial tRNA molecule that base-pairs with the UGA codon, a codon normallyused to signal a translation stop. The mRNAs for proteins in which selenocys-teine is to be inserted at a UGA codon carry an additional nucleotide sequencein the mRNA nearby that causes this recoding event (Figure 6–77).

Another form of recoding is translational frameshifting. This type of recod-ing is commonly used by retroviruses, a large group of eucaryotic viruses, inwhich it allows more than one protein to be synthesized from a single mRNA.These viruses commonly make both the capsid proteins (Gag proteins) and theviral reverse transcriptase and integrase (Pol proteins) from the same RNAtranscript (see Figure 5–73). Such a virus needs many more copies of the Gagproteins than it does of the Pol proteins, and they achieve this quantitativeadjustment by encoding the pol genes just after the gag genes but in a differ-ent reading frame. A stop codon at the end of the gag coding sequence can bebypassed on occasion by an intentional translational frameshift that occursupstream of it. This frameshift occurs at a particular codon in the mRNA andrequires a specific recoding signal, which seems to be a structural feature of theRNA sequence downstream of this site (Figure 6–78).

FROM RNA TO PROTEIN 353

Figure 6–77 Incorporation of selenocysteine into a growing polypeptide chain. A specialized tRNAis charged with serine by the normal seryl-tRNA synthetase, and the serine is subsequently convertedenzymatically to selenocysteine.A specific RNA structure in the mRNA (a stem and loop structure with aparticular nucleotide sequence) signals that selenocysteine is to be inserted at the neighboring UGA codon.As indicated, this event requires the participation of a selenocysteine-specific translation factor.

Figure 6–78 The translationalframeshifting that produces thereverse transcriptase and integraseof a retrovirus. The viral reversetranscriptase and integrase are producedby proteolytic processing of a largeprotein (the Gag–Pol fusion protein)consisting of both the Gag and Pol aminoacid sequences.The viral capsid proteinsare produced by proteolytic processing ofthe more abundant Gag protein. Both theGag and the Gag–Pol fusion proteins startidentically, but the Gag protein terminatesat an in-frame stop codon (not shown);the indicated frameshift bypasses this stopcodon, allowing the synthesis of the longerGag–Pol fusion protein.The frameshiftoccurs because features in the local RNAstructure (including the RNA loop shown)cause the tRNALeu attached to the C-terminus of the growing polypeptide chainoccasionally to slip backward by onenucleotide on the ribosome, so that itpairs with a UUU codon instead of theUUA codon that had initially specified itsincorporation; the next codon (AGG) inthe new reading frame specifies anarginine rather than a glycine.Thiscontrolled slippage is due in part to astem and loop structure that forms in theviral mRNA, as indicated in the figure.Thesequence shown is from the human AIDSvirus, HIV. (Adapted from T. Jacks et al.,Nature 331:280–283, 1988.)

E P A

H2N

A C UU G A

A C UA C UA C U

GTP

GTP

selenocysteineadded to growing

peptide chain

selenocysteinetRNA

seryl-tRNAsynthetase

serineenzymaticallyconverted to

selenocysteine

serine

SS SC

selenocysteine-specifictranslation factor

signal that thepreceding UGA

encodes selenocysteine

SC

UUUUUA G GGUUUUUA G GG

Phe Leu Gly Phe Leu Arg

5¢ 3¢viral RNA

RNA loop

5¢ 3¢ 5¢ 3¢

H2N etc.

H2N COOH

H2N etc.

H2N COOH

_1 frameshift followingLeu incorporation

RNA

protein

Gag protein Gag–Pol fusion protein

NO FRAMESHIFT (90% of ribosomes) FRAMESHIFT (10% of ribosomes)

Page 56: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Many Inhibitors of Procaryotic Protein Synthesis Are Useful as Antibiotics

Many of the most effective antibiotics used in modern medicine are compoundsmade by fungi that act by inhibiting bacterial protein synthesis. Some of thesedrugs exploit the structural and functional differences between bacterial andeucaryotic ribosomes so as to interfere preferentially with the function of bacte-rial ribosomes. Thus some of these compounds can be taken in high doses with-out undue toxicity to humans. Because different antibiotics bind to differentregions of bacterial ribosomes, they often inhibit different steps in the syntheticprocess. Some of the more common antibiotics of this kind are listed in Table6–3 along with several other inhibitors of protein synthesis, some of which acton eucaryotic cells and therefore cannot be used as antibiotics.

Because they block specific steps in the processes that lead from DNA toprotein, many of the compounds listed in Table 6–3 are useful for cell biologicalstudies. Among the most commonly used drugs in such experimental studies arechloramphenicol, cycloheximide, and puromycin, all of which specifically inhibitprotein synthesis. In a eucaryotic cell, for example, chloramphenicol inhibitsprotein synthesis on ribosomes only in mitochondria (and in chloroplasts inplants), presumably reflecting the procaryotic origins of these organelles (dis-cussed in Chapter 14). Cycloheximide, in contrast, affects only ribosomes in thecytosol. Puromycin is especially interesting because it is a structural analog of atRNA molecule linked to an amino acid and is therefore another example ofmolecular mimicry; the ribosome mistakes it for an authentic amino acid andcovalently incorporates it at the C-terminus of the growing peptide chain,thereby causing the premature termination and release of the polypeptide. Asmight be expected, puromycin inhibits protein synthesis in both procaryotesand eucaryotes.

Having described the translation process itself, we now discuss how itsproducts—the proteins of the cell—fold into their correct three-dimensionalconformations.

A Protein Begins to Fold While It Is Still Being Synthesized

The process of gene expression is not over when the genetic code has been usedto create the sequence of amino acids that constitutes a protein. To be useful to thecell, this new polypeptide chain must fold up into its unique three-dimensional

354 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

TABLE 6–3 Inhibitors of Protein or RNA Synthesis

INHIBITOR SPECIFIC EFFECT

Acting only on bacteria

Tetracycline blocks binding of aminoacyl-tRNA to A-site of ribosome

Streptomycin prevents the transition from initiation complex to chain-elongating ribosome and also causes miscoding

Chloramphenicol blocks the peptidyl transferase reaction on ribosomes (step 2 in Figure 6–65)

Erythromycin blocks the translocation reaction on ribosomes (step 3 in Figure 6–65)

Rifamycin blocks initiation of RNA chains by binding to RNA polymerase (prevents RNA synthesis)

Acting on bacteria and eucaryotes

Puromycin causes the premature release of nascent polypeptide chains by its addition to growing chain end

Actinomycin D binds to DNA and blocks the movement of RNA polymerase (prevents RNA synthesis)

Acting on eucaryotes but not bacteria

Cycloheximide blocks the translocation reaction on ribosomes (step 3 in Figure 6–65)

Anisomycin blocks the peptidyl transferase reaction on ribosomes (step 2 in Figure 6–65)

a-Amanitin blocks mRNA synthesis by binding preferentially to RNA polymerase IIThe ribosomes of eucaryotic mitochondria (and chloroplasts) often resemble those of bacteria in their sensitivity to inhibitors. Therefore, some of these antibiotics can have a deleterious effect on human mitochondria.

Page 57: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

conformation, bind any small-molecule cofactors required for its activity, beappropriately modified by protein kinases or other protein-modifying enzymes,and assemble correctly with the other protein subunits with which it functions(Figure 6–79).

The information needed for all of the protein maturation steps listed aboveis ultimately contained in the sequence of linked amino acids that the ribosomeproduces when it translates an mRNA molecule into a polypeptide chain. As dis-cussed in Chapter 3, when a protein folds into a compact structure, it buriesmost of its hydrophobic residues in an interior core. In addition, large numbersof noncovalent interactions form between various parts of the molecule. It is thesum of all of these energetically favorable arrangements that determines thefinal folding pattern of the polypeptide chain—as the conformation of lowestfree energy (see p. 134).

Through many millions of years of evolutionary time, the amino acidsequence of each protein has been selected not only for the conformation thatit adopts but also for an ability to fold rapidly, as its polypeptide chain spins outof the ribosome starting from the N-terminal end. Experiments have demon-strated that once a protein domain in a multi-domain protein emerges from theribosome, it forms a compact structure within a few seconds that contains mostof the final secondary structure (a helices and b sheets) aligned in roughly theright way (Figure 6–80). For many protein domains, this unusually open andflexible structure, which is called a molten globule, is the starting point for a rel-atively slow process in which many side-chain adjustments occur that eventu-ally form the correct tertiary structure. Nevertheless, because it takes severalminutes to synthesize a protein of average size, a great deal of the folding processis complete by the time the ribosome releases the C-terminal end of a protein(Figure 6–81).

Molecular Chaperones Help Guide the Folding of Many Proteins

The folding of many proteins is made more efficient by a special class of proteinscalled molecular chaperones. The latter proteins are useful for cells becausethere are a variety of different paths that can be taken to convert the moltenglobule form of a protein to the protein’s final compact conformation. For manyproteins, some of the intermediates formed along the way would aggregate andbe left as off-pathway dead ends without the intervention of a chaperone thatresets the folding process (Figure 6–82).

Molecular chaperones were first identified in bacteria when E. coli mutantsthat failed to allow bacteriophage lambda to replicate in them were studied.

FROM RNA TO PROTEIN 355

Figure 6–80 The structure of amolten globule. (A) A molten globuleform of cytochrome b562 is more openand less highly ordered than the finalfolded form of the protein, shown in (B).Note that the molten globule containsmost of the secondary structure of thefinal form, although the ends of the ahelices are frayed and one of the helices isonly partly formed. (Courtesy of Joshua Wand, from Y. Feng et al., Nat. Struct.Biol. 1:30–35, 1994.)(A) (B)

P

P

nascent polypeptide chain

folding andcofactor binding(non-covalent interactions)

covalent modificationby glycosylation,phosphorylation,acetylation etc.

binding to otherprotein subunits

mature functional protein

Figure 6–79 Steps in the creation ofa functional protein. As indicated,translation of an mRNA sequence into anamino acid sequence on the ribosome isnot the end of the process of forming aprotein.To be useful to the cell, thecompleted polypeptide chain must foldcorrectly into its three-dimensionalconformation, bind any cofactors required,and assemble with its partner proteinchains (if any).These changes are driven bynoncovalent bond formation.As indicated,many proteins also have covalentmodifications made to selected aminoacids.Although the most frequent of theseare protein glycosylation and proteinphosphorylation, more than 100 differenttypes of covalent modifications are known(see, for example, Figure 4–35).

Page 58: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

These mutant cells produce slightly altered versions of the chaperone machin-ery, and as a result they are defective in specific steps in the assembly of the viralproteins. The molecular chaperones are included among the heat-shock proteins(hence their designation as hsp), because they are synthesized in dramaticallyincreased amounts after a brief exposure of cells to an elevated temperature (forexample, 42°C for cells that normally live at 37°C). This reflects the operation ofa feedback system that responds to any increase in misfolded proteins (such asthose produced by elevated temperatures) by boosting the synthesis of thechaperones that help these proteins refold.

Eucaryotic cells have at least two major families of molecular chaperones—known as the hsp60 and hsp70 proteins. Different family members function indifferent organelles. Thus, as discussed in Chapter 12, mitochondria containtheir own hsp60 and hsp70 molecules that are distinct from those that functionin the cytosol, and a special hsp70 (called BIP) helps to fold proteins in the endo-plasmic reticulum.

356 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–81 The co-translationalfolding of a protein. A growingpolypeptide chain is shown acquiring itssecondary and tertiary structure as itemerges from a ribosome.The N-terminaldomain folds first, while the C-terminaldomain is still being synthesized. In thiscase, the protein has not yet achieved itsfinal conformation by the time it isreleased from the ribosome. (Modifiedfrom A.N. Federov and T.O. Baldwin,J. Biol. Chem. 272:32715–32718, 1997.)

mRNA ribosome

growingpolypeptidechain

foldedN-terminaldomain

foldingC-terminaldomain

folding of protein completed afterrelease from ribosome

Figure 6–82 A current view ofprotein folding. Each domain of a newlysynthesized protein rapidly attains a“molten globule” state. Subsequent foldingoccurs more slowly and by multiplepathways, often involving the help of amolecular chaperone. Some moleculesmay still fail to fold correctly; as explainedshortly, these are recognized and degradedby specific proteases.

moltenglobule

correctlyfoldedprotein

proteasepathway

ON-PATHWAYFOLDING

OFF-PATHWAYFOLDING

IRRETRIEVABLEACCIDENTS

chaperonecatalysis

chaperonecatalysis

Page 59: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

The hsp60-like and hsp70 proteins each work with their own small set ofassociated proteins when they help other proteins to fold. They share an affinityfor the exposed hydrophobic patches on incompletely folded proteins, and theyhydrolyze ATP, often binding and releasing their protein with each cycle of ATPhydrolysis. In other respects, the two types of hsp proteins function differently.The hsp70 machinery acts early in the life of many proteins, binding to a stringof about seven hydrophobic amino acids before the protein leaves the ribosome(Figure 6–83). In contrast, hsp60-like proteins form a large barrel-shaped struc-ture that acts later in a protein’s life, after it has been fully synthesized. This typeof chaperone forms an “isolation chamber” into which misfolded proteins arefed, preventing their aggregation and providing them with a favorable environ-ment in which to attempt to refold (Figure 6–84).

Exposed Hydrophobic Regions Provide Critical Signals for Protein Quality Control

If radioactive amino acids are added to cells for a brief period, the newly syn-thesized proteins can be followed as they mature into their final functionalform. It is this type of experiment that shows that the hsp70 proteins act first,beginning when a protein is still being synthesized on a ribosome, and that the

FROM RNA TO PROTEIN 357

ATP

ADP

correctlyfoldedprotein

incorrectlyfoldedprotein

GroES cap

hydrophobicprotein-binding

sites

(A)

hsp60-likeprotein complex

Pi+

(B)

Figure 6–84 The structure and function of the hsp60 family of molecularchaperones. (A) The catalysis of protein refolding. As indicated, a misfolded protein isinitially captured by hydrophobic interactions along one rim of the barrel.The subsequentbinding of ATP plus a protein cap increases the diameter of the barrel rim, which maytransiently stretch (partly unfold) the client protein.This also confines the protein in anenclosed space, where it has a new opportunity to fold.After about 15 seconds, ATPhydrolysis ejects the protein, whether folded or not, and the cycle repeats.This type ofmolecular chaperone is also known as a chaperonin; it is designated as hsp60 inmitochondria,TCP-1 in the cytosol of vertebrate cells, and GroEL in bacteria. As indicated,only half of the symmetrical barrel operates on a client protein at any one time. (B) Thestructure of GroEL bound to its GroES cap, as determined by x-ray crystallography. On theleft is shown the outside of the barrel-like structure and on the right a cross sectionthrough its center. (B, adapted from B. Bukace and A.L. Horwich, Cell 92:351–366, 1998.)

Figure 6–83 The hsp70 family ofmolecular chaperones. These proteinsact early, recognizing a small stretch ofhydrophobic amino acids on a protein’ssurface.Aided by a set of smaller hsp40proteins, an hsp70 monomer binds to itstarget protein and then hydrolyzes amolecule of ATP to ADP, undergoing aconformational change that causes thehsp70 to clamp down very tightly on thetarget.After the hsp40 dissociates, thedissociation of the hsp70 protein isinduced by the rapid re-binding of ATPafter ADP release. Repeated cycles of hspprotein binding and release help the targetprotein to refold, as schematicallyillustrated in Figure 6–82.

ATP ADP

ATP ADP

ribosome

hsp70machinery

hsp70machinery

correctly foldedprotein

incorrectly foldedprotein

Pi+

Page 60: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

hsp60-like proteins are called into play only later to help in folding completedproteins. However, the same experiments reveal that only a subset of the newlysynthesized proteins becomes involved: perhaps 20% of all proteins with thehsp70 and 10% with the hsp60-like molecular chaperones. How are these pro-teins selected for this ATP-catalyzed refolding?

Before answering, we need to pause to consider the post-translational fateof proteins more broadly. A protein that has a sizable exposed patch ofhydrophobic amino acids on its surface is usually abnormal: it has either failedto fold correctly after leaving the ribosome, suffered an accident that partlyunfolded it at a later time, or failed to find its normal partner subunit in a largerprotein complex. Such a protein is not merely useless to the cell, it can be dan-gerous. Many proteins with an abnormally exposed hydrophobic region canform large aggregates, precipitating out of solution. We shall see that, in rarecases, such aggregates do form and cause severe human diseases. But in thevast majority of cells, powerful protein quality control mechanisms preventsuch disasters.

Given this background, it is not surprising that cells have evolved elaboratemechanisms that recognize and remove the hydrophobic patches on proteins.Two of these mechanisms depend on the molecular chaperones just discussed,which bind to the patch and attempt to repair the defective protein by giving itanother chance to fold. At the same time, by covering the hydrophobic patches,these chaperones transiently prevent protein aggregation. Proteins that veryrapidly fold correctly on their own do not display such patches and are thereforebypassed by chaperones.

Figure 6–85 outlines all of the quality control choices that a cell makes for adifficult-to-fold, newly synthesized protein. As indicated, when attempts torefold a protein fail, a third mechanism is called into play that completelydestroys the protein by proteolysis. The proteolytic pathway begins with therecognition of an abnormal hydrophobic patch on a protein’s surface, and itends with the delivery of the entire protein to a protein destruction machine, acomplex protease known as the proteasome. As described next, this processdepends on an elaborate protein-marking system that also carries out othercentral functions in the cell by destroying selected normal proteins.

The Proteasome Degrades a Substantial Fraction of the Newly Synthesized Proteins in Cells

Cells quickly remove the failures of their translation processes. Recent experi-ments suggest that as many as one-third of the newly made polypeptide chainsare selected for rapid degradation as a result of the protein quality controlmechanisms just described. The final disposal apparatus in eucaryotes is theproteasome, an abundant ATP-dependent protease that constitutes nearly 1%of cellular protein. Present in many copies dispersed throughout the cytosol andthe nucleus, the proteasome also targets proteins of the endoplasmic reticulum(ER): those proteins that fail either to fold or to be assembled properly after theyenter the ER are detected by an ER-based surveillance system that retrotranslo-cate them back to the cytosol for degradation (discussed in Chapter 12).

Each proteasome consists of a central hollow cylinder (the 20S core protea-some) formed from multiple protein subunits that assemble as a cylindrical

358 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–85 The cellular mechanismsthat monitor protein quality afterprotein synthesis. As indicated, a newlysynthesized protein sometimes foldscorrectly and assembles with its partnerson its own, in which case it is left alone.Incompletely folded proteins are helped torefold by molecular chaparones: first by afamily of hsp70 proteins, and if this fails,then by hsp60-like proteins. In both casesthe client proteins are recognized by anabnormally exposed patch of hydrophobicamino acids on their surface.Theseprocesses compete with a different systemthat recognizes an abnormally exposedpatch and transfers the protein thatcontains it to a proteasome for completedestruction.The combination of all ofthese processes is needed to preventmassive protein aggregation in a cell,which can occur when many hydrophobicregions on proteins clump together andprecipitate the entire mass out ofsolution.

correctly foldedwithout help

correctly foldedwith help of a

molecularchaperone

incompletelyfolded formsdigested inproteasome

newlysynthesizedprotein

proteinaggregate

increasing time

Page 61: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

stack of four heptameric rings. Some of these subunits are distinct proteaseswhose active sites face the cylinder’s inner chamber (Figure 6–86A). Each end ofthe cylinder is normally associated with a large protein complex (the 19S cap)containing approximately 20 distinct polypeptides (Figure 6–86B). The cap sub-units include at least six proteins that hydrolyze ATP; located near the edge ofthe cylinder, these ATPases are thought to unfold the proteins to be digested andmove them into the interior chamber for proteolysis. A crucial property of theproteasome, and one reason for the complexity of its design, is the processivityof its mechanism: in contrast to a “simple” protease that cleaves a substrate’spolypeptide chain just once before dissociating, the proteasome keeps theentire substrate bound until all of it is converted into short peptides.

The 19S caps act as regulated “gates” at the entrances to the inner prote-olytic chamber, being also responsible for binding a targeted protein substrateto the proteasome. With a few exceptions, the proteasomes act on proteins thathave been specifically marked for destruction by the covalent attachment ofmultiple copies of a small protein called ubiquitin (Figure 6–87A). Ubiquitinexists in cells either free or covalently linked to a huge variety of intracellularproteins. For most of these proteins, this tagging by ubiquitin results in theirdestruction by the proteasome.

An Elaborate Ubiquitin-conjugating System Marks Proteins for Destruction

Ubiquitin is prepared for conjugation to other proteins by the ATP-dependentubiquitin-activating enzyme (E1), which creates an activated ubiquitin that istransferred to one of a set of ubiquitin-conjugating (E2) enzymes. The E2enzymes act in conjunction with accessory (E3) proteins. In the E2–E3 complex,called ubiquitin ligase, the E3 component binds to specific degradation signalsin protein substrates, helping E2 to form a multiubiquitin chain linked to alysine of the substrate protein. In this chain, the C-terminal residue of eachubiquitin is linked to a specific lysine of the preceding ubiquitin molecule, pro-ducing a linear series of ubiquitin–ubiquitin conjugates (Figure 6–87B). It is thismultiubiquitin chain on a target protein that is recognized by a specific receptorin the proteasome.

There are roughly 30 structurally similar but distinct E2 enzymes in mam-mals, and hundreds of different E3 proteins that form complexes with specificE2 enzymes. The ubiquitin–proteasome system thus consists of many distinctbut similarly organized proteolytic pathways, which have in common both theE1 enzyme at the “top” and the proteasome at the “bottom,” and differ by thecompositions of their E2–E3 ubiquitin ligases and accessory factors. Distinctubiquitin ligases recognize different degradation signals, and therefore target fordegradation distinct subsets of intracellular proteins that bear these signals.

FROM RNA TO PROTEIN 359

Figure 6–86 The proteasome.(A) A cut-away view of the structure ofthe central 20S cylinder, as determined byx-ray crystallography, with the active sitesof the proteases indicated by red dots.(B) The structure of the entireproteasome, in which the central cylinder(yellow) is supplemented by a 19S cap(blue) at each end, whose structure hasbeen determined by computer processingof electron microscope images.Thecomplex cap structure selectively bindsthose proteins that have been marked fordestruction; it then uses ATP hydrolysis tounfold their polypeptide chains and feedthem into the inner chamber of the 20Scylinder for digestion to short peptides.(B, from W. Baumeister et al., Cell92:367–380, 1998. © Elsevier.)

(A) (B)

Page 62: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

360 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

P P

ATP AMP

E1

E1

E1

E3

E2

E3

E2

E3

E2E1E1 SH

E1 SH

SH

S C O S C OS C O

COO–

ubiquitin

+

ubiquitin-activatingenzyme

binding toubiquitin

ligase

ubiquitin ligaseprimed with

ubiquitin(B)

(A)

(C)

NH2

E3

E2NH2

target protein withmultiubiquitin

chain

first ubiquitinchain added

to target protein

target proteinbound to

ubiquitin ligase

degradation signalon target protein

ε-amino groupon lysineside chain

NH2

HOOC

hydrophobicglobular core

point of attachmentto lysine side chainsof proteins

Figure 6–87 Ubiquitin and the marking of proteins withmultiubiquitin chains. (A) The three-dimensional structure of ubiquitin;this relatively small protein contains 76 amino acids. (B) The C-terminus ofubiquitin is initially activated through its high-energy thioester linkage to acysteine side chain on the E1 protein.This reaction requires ATP, and itproceeds via a covalent AMP-ubiquitin intermediate.The activated ubiquitinon E1, also known as the ubiquitin-activating enzyme, is then transferred tothe cysteines on a set of E2 molecules.These E2s exist as complexes with aneven larger family of E3 molecules. (C) The addition of a multiubiquitin chainto a target protein. In a mammalian cell there are roughly 300 distinct E2–E3complexes, each of which recognizes a different degradation signal on atarget protein by means of its E3 component.The E2s are called ubiquitin-conjugating enzymes.The E3s have been referred to traditionally as ubiquitinligases, but it is more accurate to reserve this name for the functional E2–E3 complex.

Denatured or otherwise misfolded proteins, as well as proteins containingoxidized or other abnormal amino acids, are recognized and destroyed becauseabnormal proteins tend to present on their surface amino acid sequences orconformational motifs that are recognized as degradation signals by a set of E3molecules in the ubiquitin–proteasome system; these sequences must of coursebe buried and therefore inaccessible in the normal counterparts of these pro-teins. However, a proteolytic pathway that recognizes and destroys abnormalproteins must be able to distinguish between completed proteins that have“wrong” conformations and the many growing polypeptides on ribosomes (aswell as polypeptides just released from ribosomes) that have not yet achievedtheir normal folded conformation. This is not a trivial problem; theubiquitin–proteasome system is thought to destroy some of the nascent andnewly formed protein molecules not because these proteins are abnormal assuch but because they transiently expose degradation signals that are buried intheir mature (folded) state.

Many Proteins Are Controlled by Regulated Destruction

One function of intracellular proteolytic mechanisms is to recognize and elimi-nate misfolded or otherwise abnormal proteins, as just described. Yet anotherfunction of these proteolytic pathways is to confer short half-lives on specificnormal proteins whose concentrations must change promptly with alterations

Page 63: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

in the state of a cell. Some of these short-lived proteins are degraded rapidly atall times, while many others are conditionally short-lived, that is, they aremetabolically stable under some conditions, but become unstable upon achange in the cell’s state. For example, mitotic cyclins are long-lived throughoutthe cell cycle until their sudden degradation at the end of mitosis, as explainedin Chapter 17.

How is such a regulated destruction of a protein controlled? A variety ofmechanisms are known, as illustrated through specific examples later in thisbook. In one general class of mechanism (Figure 6–88A), the activity of a ubiqui-tin ligase is turned on either by E3 phosphorylation or by an allosteric transitionin an E3 protein caused by its binding to a specific small or large molecule. Forexample, the anaphase-promoting complex (APC) is a multisubunit ubiquitinligase that is activated by a cell-cycle-timed subunit addition at mitosis. Theactivated APC then causes the degradation of mitotic cyclins and several otherregulators of the metaphase–anaphase transition (see Figure 17–20).

Alternatively, in response either to intracellular signals or to signals from theenvironment, a degradation signal can be created in a protein, causing its rapidubiquitylation and destruction by the proteasome. One common way to createsuch a signal is to phosphorylate a specific site on a protein that unmasks a nor-mally hidden degradation signal. Another way to unmask such a signal is by theregulated dissociation of a protein subunit. Finally, powerful degradation signalscan be created by a single cleavage of a peptide bond, provided that this cleavagecreates a new N-terminus that is recognized by a specific E3 as a “destabilizing”N-terminal residue (Figure 6–88B).

FROM RNA TO PROTEIN 361

Figure 6–88 Two general ways ofinducing the degradation of a specificprotein. (A) Activation of a specific E3molecule creates a new ubiquitin ligase.(B) Creation of an exposed degradationsignal in the protein to be degraded.Thissignal binds a ubiquitin ligase, causing theaddition of a multiubiquitin chain to anearby lysine on the target protein.All sixpathways shown are known to be used bycells to induce the movement of selectedproteins into the proteasome.

ATP

E2

E2

ADP

ATP

ADP

P

P

E2

E2

E2

E2

phosphorylationby protein kinase

phosphorylationby protein kinase

unmasking byprotein dissociation

creation of destabilizingN-terminus

allosteric transitioncaused by ligand binding

allosteric transitioncaused by proteinsubunit addition

(A) ACTIVATION OF A UBIQUITIN LIGASE

(B) ACTIVATION OF A DEGRADATION SIGNAL

H2O

N

N

N

C

C

C

E3

E3

E3

E3

E3

E3

Page 64: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

The N-terminal type of degradation signal arises because of the “N-endrule,” which relates the half-life of a protein in vivo to the identity of its N-ter-minal residue. There are 12 destabilizing residues in the N-end rule of the yeastS. cerevisiae (Arg, Lys, His, Phe, Leu, Tyr, Trp, Ile, Asp, Glu, Asn, and Gln), out ofthe 20 standard amino acids. The destabilizing N-terminal residues are recog-nized by a special ubiquitin ligase that is conserved from yeast to humans.

As we have seen, all proteins are initially synthesized bearing methionine (orformylmethionine in bacteria), as their N-terminal residue, which is a stabilizingresidue in the N-end rule. Special proteases, called methionine aminopeptidases,will often remove the first methionine of a nascent protein, but they will do soonly if the second residue is also stabilizing in the yeast-type N-end rule. There-fore, it was initially unclear how N-end rule substrates form in vivo. However, ithas recently been discovered that a subunit of cohesin, a protein complex thatholds sister chromatids together, is cleaved by a site-specific protease at themetaphase–anaphase transition. This cell-cycle-regulated cleavage allows sepa-ration of the sister chromatids and leads to the completion of mitosis (see Fig-ure 17–26). The C-terminal fragment of the cleaved subunit bears an N-terminalarginine, a destabilizing residue in the N-end rule. Mutant cells lacking the N-end rule pathway exhibit a greatly increased frequency of chromosome loss,presumably because a failure to degrade this fragment of the cohesin subunitinterferes with the formation of new chromatid-associated cohesin complexesin the next cell cycle.

Abnormally Folded Proteins Can Aggregate to CauseDestructive Human Diseases

When all of a cell’s protein quality controls fail, large protein aggregates tend toaccumulate in the affected cell (see Figure 6–85). Some of these aggregates, byadsorbing critical macromolecules to them, can severely damage cells and evencause cell death. The protein aggregates released from dead cells tend to accu-mulate in the extracellular matrix that surrounds the cells in a tissue, and inextreme cases they can also damage tissues. Because the brain is composed of ahighly organized collection of nerve cells, it is especially vulnerable. Not sur-prisingly, therefore, protein aggregates primarily cause diseases of neuro-degeneration. Prominent among these are Huntington’s disease and Alzheimer’sdisease—the latter causing age-related dementia in more than 20 million peoplein today’s world.

For a particular type of protein aggregate to survive, grow, and damage anorganism, it must be highly resistant to proteolysis both inside and outside thecell. Many of the protein aggregates that cause problems form fibrils built froma series of polypeptide chains that are layered one over the other as a continu-ous stack of b sheets. This so-called cross-beta filament (Figure 6–89C) tends tobe highly resistant to proteolysis. This resistance presumably explains why thisstructure is observed in so many of the neurological disorders caused by proteinaggregates, where it produces abnormally staining deposits known as amyloid.

One particular variety of these diseases has attained special notoriety. Theseare the prion diseases. Unlike Huntington’s or Alzheimer’s disease, a prion dis-ease can spread from one organism to another, providing that the second organ-ism eats a tissue containing the protein aggregate. A set of diseases—calledscrapie in sheep, Creutzfeldt–Jacob disease (CJD) in humans, and bovinespongiform encephalopathy (BSE) in cattle—are caused by a misfolded, aggre-gated form of a protein called PrP (for prion protein). The PrP is normallylocated on the outer surface of the plasma membrane, most prominently inneurons. Its normal function is not known. However, PrP has the unfortunateproperty of being convertible to a very special abnormal conformation (Figure6–89A). This conformation not only forms protease-resistant, cross-beta fila-ments; it also is “infectious” because it converts normally folded molecules ofPrP to the same form. This property creates a positive feedback loop that prop-agates the abnormal form of PrP, called PrP* (Figure 6–89B) and thereby allowsPrP to spread rapidly from cell to cell in the brain, causing the death of both

362 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Page 65: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

animals and humans. It can be dangerous to eat the tissues of animals that con-tain PrP*, as witnessed most recently by the spread of BSE (commonly referredto as the “mad cow disease”) from cattle to humans in Great Britain.

Fortunately, in the absence of PrP*, PrP is extraordinarily difficult to convertto its abnormal form. Although very few proteins have the potential to misfoldinto an infectious conformation, a similar transformation has been discoveredto be the cause of an otherwise mysterious “protein-only inheritance” observedin yeast cells.

There Are Many Steps From DNA to Protein

We have seen so far in this chapter that many different types of chemical reac-tions are required to produce a properly folded protein from the informationcontained in a gene (Figure 6–90). The final level of a properly folded protein ina cell therefore depends upon the efficiency with which each of the many stepsis performed.

We discuss in Chapter 7 that cells have the ability to change the levels oftheir proteins according to their needs. In principle, any or all of the steps in Fig-

FROM RNA TO PROTEIN 363

heterodimer homodimeramyloid

very rareconformational

change

(A)

(B) infectious seeding of amyloid fiber formation

(C)

5 nm

(D)

Figure 6–89 Protein aggregates that cause human disease. (A) Schematic illustration of the type ofconformational change in a protein that produces material for a cross-beta filament. (B) Diagram illustratingthe self-infectious nature of the protein aggregation that is central to prion diseases. PrP is highly unusualbecause the misfolded version of the protein, called PrP*, induces the normal PrP protein it contacts tochange its conformation, as shown. Most of the human diseases caused by protein aggregation are caused bythe overproduction of a variant protein that is especially prone to aggregation, but because this structure isnot infectious in this way, it cannot spread from one animal to another. (C) Drawing of a cross-beta filament,a common type of protease-resistant protein aggregate found in a variety of human neurological diseases.Because the hydrogen-bond interactions in a b sheet form between polypeptide backbone atoms (see Figure3–9), a number of different abnormally folded proteins can produce this structure. (D) One of severalpossible models for the conversion of PrP to PrP*, showing the likely change of two a-helices into four b-strands.Although the structure of the normal protein has been determined accurately, the structure of theinfectious form is not yet known with certainty because the aggregation has prevented the use of standardstructural techniques. (C, courtesy of Louise Serpell, adapted from M. Sunde et al., J. Mol. Biol. 273:729–739,1997; D, adapted from S.B. Prusiner, Trends Biochem. Sci. 21:482–487, 1996.)

Page 66: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

ure 6–90) could be regulated by the cell for each individual protein. However, aswe shall see in Chapter 7, the initiation of transcription is the most commonpoint for a cell to regulate the expression of each of its genes. This makes sense,inasmuch as the most efficient way to keep a gene from being expressed is toblock the very first step—the transcription of its DNA sequence into an RNAmolecule.

Summary

The translation of the nucleotide sequence of an mRNA molecule into protein takesplace in the cytoplasm on a large ribonucleoprotein assembly called a ribosome.Theamino acids used for protein synthesis are first attached to a family of tRNAmolecules, each of which recognizes, by complementary base-pair interactions, par-ticular sets of three nucleotides in the mRNA (codons).The sequence of nucleotides inthe mRNA is then read from one end to the other in sets of three according to thegenetic code.

To initiate translation, a small ribosomal subunit binds to the mRNA moleculeat a start codon (AUG) that is recognized by a unique initiator tRNA molecule. Alarge ribosomal subunit binds to complete the ribosome and begin the elongationphase of protein synthesis. During this phase, aminoacyl tRNAs—each bearing aspecific amino acid bind sequentially to the appropriate codon in mRNA by formingcomplementary base pairs with the tRNA anticodon. Each amino acid is added to theC-terminal end of the growing polypeptide by means of a cycle of three sequential

364 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–90 The production of aprotein by a eucaryotic cell. The finallevel of each protein in a eucaryotic celldepends upon the efficiency of each stepdepicted.

3¢DNA

introns exons

INITIATION OF TRANSCRIPTION

POLYADENYLATIONTERMINATION

INITIATION OF PROTEIN SYNTHESIS

COMPLETION OF PROTEIN SYNTHESISAND PROTEIN FOLDING

mRNA DEGRADATION

PROTEIN DEGRADATION

EXPORT

CAPPINGELONGATIONSPLICING

AAAA mRNA

AAAA

AAAA

mRNA

NUCLEUS

CYTOSOL

cap

COOH

H2N

H2N

COOH

poly-A tail

Page 67: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

steps: aminoacyl-tRNA binding, followed by peptide bond formation, followed byribosome translocation.The mRNA molecule progresses codon by codon through theribosome in the 5¢-to-3¢ direction until one of three stop codons is reached. A releasefactor then binds to the ribosome, terminating translation and releasing the com-pleted polypeptide.

Eucaryotic and bacterial ribosomes are closely related, despite differences in thenumber and size of their rRNA and protein components.The rRNA has the dominantrole in translation, determining the overall structure of the ribosome, forming thebinding sites for the tRNAs, matching the tRNAs to codons in the mRNA, and provid-ing the peptidyl transferase enzyme activity that links amino acids together duringtranslation.

In the final steps of protein synthesis, two distinct types of molecular chaperonesguide the folding of polypeptide chains. These chaperones, known as hsp60 andhsp70, recognize exposed hydrophobic patches on proteins and serve to prevent theprotein aggregation that would otherwise compete with the folding of newly synthe-sized proteins into their correct three-dimensional conformations.This protein fold-ing process must also compete with a highly elaborate quality control mechanismthat destroys proteins with abnormally exposed hydrophobic patches. In this case,ubiquitin is covalently added to a misfolded protein by a ubiquitin ligase, and theresulting multiubiquitin chain is recognized by the cap on a proteasome to move theentire protein to the interior of the proteasome for proteolytic degradation. A closelyrelated proteolytic mechanism, based on special degradation signals recognized byubiquitin ligases, is used to determine the lifetime of many normally folded proteins.By this method, selected normal proteins are removed from the cell in response tospecific signals.

THE RNA WORLD AND THE ORIGINS OF LIFETo fully understand the processes occurring in present-day living cells, we needto consider how they arose in evolution. The most fundamental of all such prob-lems is the expression of hereditary information, which today requires extraor-dinarily complex machinery and proceeds from DNA to protein through an RNAintermediate. How did this machinery arise? One view is that an RNA worldexisted on Earth before modern cells arose (Figure 6–91). According to thishypothesis, RNA both stored genetic information and catalyzed the chemicalreactions in primitive cells. Only later in evolutionary time did DNA take over asthe genetic material and proteins become the major catalyst and structuralcomponent of cells. If this idea is correct, then the transition out of the RNAworld was never complete; as we have seen in this chapter, RNA still catalyzesseveral fundamental reactions in modern-day cells, which can be viewed asmolecular fossils of an earlier world.

In this section we outline some of the arguments in support of the RNAworld hypothesis. We will see that several of the more surprising features ofmodern-day cells, such as the ribosome and the pre-mRNA splicing machinery,are most easily explained by viewing them as descendants of a complex networkof RNA-mediated interactions that dominated cell metabolism in the RNAworld. We also discuss how DNA may have taken over as the genetic material,how the genetic code may have arisen, and how proteins may have eclipsed RNAto perform the bulk of biochemical catalysis in modern-day cells.

Life Requires Autocatalysis

It has been proposed that the first “biological” molecules on Earth were formedby metal-based catalysis on the crystalline surfaces of minerals. In principle, anelaborate system of molecular synthesis and breakdown (metabolism) couldhave existed on these surfaces long before the first cells arose. But life requiresmolecules that possess a crucial property: the ability to catalyze reactions thatlead, directly or indirectly, to the production of more molecules like themselves.Catalysts with this special self-promoting property can use raw materials to

THE RNA WORLD AND THE ORIGINS OF LIFE 365

Page 68: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

reproduce themselves and thereby divert these same materials from the pro-duction of other substances. But what molecules could have had such autocat-alytic properties in early cells? In present-day cells the most versatile catalystsare polypeptides, composed of many different amino acids with chemicallydiverse side chains and, consequently, able to adopt diverse three-dimensionalforms that bristle with reactive chemical groups. But, although polypeptides areversatile as catalysts, there is no known way in which one such molecule canreproduce itself by directly specifying the formation of another of precisely thesame sequence.

Polynucleotides Can Both Store Information and Catalyze Chemical Reactions

Polynucleotides have one property that contrasts with those of polypeptides:they can directly guide the formation of exact copies of their own sequence. Thiscapacity depends on complementary base pairing of nucleotide subunits, whichenables one polynucleotide to act as a template for the formation of another. Aswe have seen in this and the preceding chapter, such complementary templat-ing mechanisms lie at the heart of DNA replication and transcription in modern-day cells.

But the efficient synthesis of polynucleotides by such complementary tem-plating mechanisms requires catalysts to promote the polymerization reaction:without catalysts, polymer formation is slow, error-prone, and inefficient. Today,template-based nucleotide polymerization is rapidly catalyzed by proteinenzymes—such as the DNA and RNA polymerases. How could it be catalyzedbefore proteins with the appropriate enzymatic specificity existed? The begin-nings of an answer to this question were obtained in 1982, when it was discov-ered that RNA molecules themselves can act as catalysts. We have seen in thischapter, for example, that a molecule of RNA is the catalyst for the peptidyltransferase reaction that takes place on the ribosome. The unique potential ofRNA molecules to act both as information carrier and as catalyst forms the basisof the RNA world hypothesis.

RNA therefore has all the properties required of a molecule that could cat-alyze its own synthesis (Figure 6–92). Although self-replicating systems of RNAmolecules have not been found in nature, scientists are hopeful that they can beconstructed in the laboratory. While this demonstration would not prove thatself-replicating RNA molecules were essential in the origin of life on Earth, itwould certainly suggest that such a scenario is possible.

A Pre-RNA World Probably Predates the RNA World

Although RNA seems well suited to form the basis for a self-replicating set of bio-chemical catalysts, it is unlikely that RNA was the first kind of molecule to do so.

366 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

catalysis

Figure 6–92 An RNA molecule that can catalyze its own synthesis.This hypothetical process would require catalysis of the production of botha second RNA strand of complementary nucleotide sequence and the use ofthis second RNA molecule as a template to form many molecules of RNAwith the original sequence.The red rays represent the active site of thishypothetical RNA enzyme.

present10 5

solarsystemformed

RNAWORLD

firstcellswithDNA

firstmammals

15 billion years ago

big bang

Figure 6–91 Time line for theuniverse, suggesting the earlyexistence of an RNA world of livingsystems.

Page 69: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

From a purely chemical standpoint, it is difficult to imagine how long RNAmolecules could be formed initially by purely nonenzymatic means. For onething, the precursors of RNA, the ribonucleotides, are difficult to form nonenzy-matically. Moreover, the formation of RNA requires that a long series of 3¢ to 5¢phosphodiester linkages form in the face of a set of competing reactions, includ-ing hydrolysis, 2¢ to 5¢ linkages, 5¢ to 5¢ linkages, and so on. Given these problems,it has been suggested that the first molecules to possess both catalytic activityand information storage capabilities may have been polymers that resembleRNA but are chemically simpler (Figure 6–93). We do not have any remnants ofthese compounds in present-day cells, nor do such compounds leave fossilrecords. Nonetheless, the relative simplicity of these “RNA-like polymers” makethem better candidates than RNA itself for the first biopolymers on Earth thathad both information storage capacity and catalytic activity.

The transition between the pre-RNA world and the RNA world would haveoccurred through the synthesis of RNA using one of these simpler compoundsas both template and catalyst. The plausibility of this scheme is supported bylaboratory experiments showing that one of these simpler forms (PNA—see Fig-ure 6–93) can act as a template for the synthesis of complementary RNAmolecules, because the overall geometry of the bases is similar in the twomolecules. Presumably, pre-RNA polymers also catalyzed the formation ofribonucleotide precursors from simpler molecules. Once the first RNAmolecules had been produced, they could have diversified gradually to take overthe functions originally carried out by the pre-RNA polymers, leading eventuallyto the postulated RNA world.

THE RNA WORLD AND THE ORIGINS OF LIFE 367

Figure 6–93 Structures of RNA and two related information-carrying polymers. In each case, B indicates the positions of purine andpyrimidine bases.The polymer p-RNA (pyranosyl-RNA) is RNA in which thefuranose (five-membered ring) form of ribose has been replaced by thepyranose (six-membered ring) form. In PNA (peptide nucleic acid), theribose phosphate backbone of RNA has been replaced by the peptidebackbone found in proteins. Like RNA, both p-RNA and PNA can formdouble helices through complementary base-pairing, and each couldtherefore in principle serve as a template for its own synthesis (see Figure 6–92).

O

BHO–O O

O

O

P

O

O

NH

B

OH

O

O

O

B

OH

O

O–O P

O

O

O

B

OH

O–O P

O

O–O P

OO

BHO–O O

O

P

O

BHO–O O

O

O

P

O

N

B

O

O NH

N

B

O

O NH

N

B

O

O

RNA p-RNA PNA

Page 70: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

368 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Figure 6–94 Common elements ofRNA secondary structure.Conventional, complementary base-pairinginteractions are indicated by red “rungs” indouble-helical portions of the RNA.

Figure 6–95 Examples of RNAtertiary interactions. Some of theseinteractions can join distant parts of thesame RNA molecule or bring twoseparate RNA molecules together.

5¢3¢

3¢5¢

5¢ 3¢

3¢5¢

3¢5¢

single strand

symmetricinternal loop

asymmetricinternal loop

two-stem junctionor coaxial stack

three-stem junction four-stem junction

double strand hairpin loopsingle-nucleotide

bulge

three-nucleotide

bulge

5¢5¢

3¢pseudoknot kissing hairpins hairpin loop-bulge contact

Single-stranded RNA Molecules Can Fold into Highly Elaborate Structures

We have seen that complementary base-pairing and other types of hydrogenbonds can occur between nucleotides in the same chain, causing an RNAmolecule to fold up in a unique way determined by its nucleotide sequence (see,for example, Figures 6–6, 6–52, and 6–67). Comparisons of many RNA structureshave revealed conserved motifs, short structural elements that are used over andover again as parts of larger structures. Some of these RNA secondary structuralmotifs are illustrated in Figure 6–94. In addition, a few common examples ofmore complex and often longer-range interactions, known as RNA tertiary inter-actions, are shown in Figure 6–95.

Protein catalysts require a surface with unique contours and chemical prop-erties on which a given set of substrates can react (discussed in Chapter 3). Inexactly the same way, an RNA molecule with an appropriately folded shape canserve as an enzyme (Figure 6–96). Like some proteins, many of these ribozymeswork by positioning metal ions at their active sites. This feature gives them awider range of catalytic activities than can be accounted for solely by the limitedchemical groups of the polynucleotide chain.

Relatively few catalytic RNAs exist in modern-day cells, however, and muchof our inference about the RNA world has come from experiments in which largepools of RNA molecules of random nucleotide sequences are generated in thelaboratory. Those rare RNA molecules with a property specified by the experi-menter are then selected out and studied (Figure 6–97). Experiments of this type

Page 71: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

THE RNA WORLD AND THE ORIGINS OF LIFE 369

Figure 6–96 (above) A ribozyme. Thissimple RNA molecule catalyzes thecleavage of a second RNA at a specificsite.This ribozyme is found embedded inlarger RNA genomes—called viroids—which infect plants.The cleavage, whichoccurs in nature at a distant location onthe same RNA molecule that contains theribozyme, is a step in the replication ofthe viroid genome.Although not shown inthe figure, the reaction requires amolecule of Mg positioned at the activesite. (Adapted from T.R. Cech and O.C. Uhlenbeck, Nature 372:39–40, 1994.)

ribozyme

ribozyme

+

+

substrateRNA

BASE-PAIRING BETWEENRIBOZYME AND SUBSTRATE

SUBSTRATE CLEAVAGE

cleavedRNA

PRODUCT RELEASE

Figure 6–97 (left) In vitro selection ofa synthetic ribozyme. Beginning with alarge pool of nucleic acid moleculessynthesized in the laboratory, those rareRNA molecules that possess a specifiedcatalytic activity can be isolated andstudied.Although a specific example (thatof an autophosphorylating ribozyme) isshown, variations of this procedure havebeen used to generate many of theribozymes listed in Table 6–4. During theautophosphorylation step, the RNAmolecules are sufficiently dilute to preventthe “cross”-phosphorylation of additionalRNA molecules. In reality, severalrepetitions of this procedure arenecessary to select the very rare RNAmolecules with catalytic activity.Thus thematerial initially eluted from the column isconverted back into DNA, amplified manyfold (using reverse transcriptase and PCRas explained in Chapter 8), transcribedback into RNA, and subjected to repeatedrounds of selection. (Adapted from J.R. Lorsch and J.W. Szostak, Nature371:31–36, 1994.)ATP

ADP

large pool of double-stranded DNA molecules,each with a different, randomly generated

nucleotide sequence

large pool of single-stranded RNA molecules,each with a different, randomly generated

nucleotide sequence

only the rare RNA molecules able tophosphorylate themselves incorporate sulfur

rare RNA moleculesthat have kinase activity

TRANSCRIPTION BYRNA POLYMERASEAND FOLDING OFRNA MOLECULES

ADDITION OFATP DERIVATIVECONTAINING A

SULFUR IN PLACEOF AN OXYGEN

CAPTURE OFPHOSPHORYLATED

MATERIAL ONCOLUMN MATERIAL

THAT BINDS TIGHTLYTO THE SULFUR

GROUP

ELUTION OFBOUND MOLECULES

g S

P O

O

O–S

P O

O

O–S

discard RNAmoleculesthat fail tobind to thecolumn

Page 72: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

have created RNAs that can catalyze a wide variety of biochemical reactions(Table 6–4), and suggest that the main difference between protein enzymes andribozymes lies in their maximum reaction speed, rather than in the diversity ofthe reactions that they can catalyze.

Like proteins, RNAs can undergo allosteric conformational changes, eitherin response to small molecules or to other RNAs. One artificially createdribozyme can exist in two entirely different conformations, each with a differentcatalytic activity (Figure 6–98). Moreover, the structure and function of therRNAs in the ribosome alone have made it clear that RNA is an enormously ver-satile molecule. It is therefore easy to imagine that an RNA world could reach ahigh level of biochemical sophistication.

Self-Replicating Molecules Undergo Natural Selection

The three-dimensional folded structure of a polynucleotide affects its stability,its actions on other molecules, and its ability to replicate. Therefore, certainpolynucleotides will be especially successful in any self-replicating mixture.Because errors inevitably occur in any copying process, new variant sequencesof these polynucleotides will be generated over time.

370 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

TABLE 6–4 Some Biochemical Reactions That Can Be Catalyzedby Ribozymes

ACTIVITY RIBOZYMES

Peptide bond formation in ribosomal RNAprotein synthesis

RNA cleavage, RNA ligation self-splicing RNAs; also in vitro selected RNA

DNA cleavage self-splicing RNAs

RNA splicing self-splicing RNAs, perhaps RNAs of the spliceosome

RNA polymerizaton in vitro selected RNA

RNA and DNA phosphorylation in vitro selected RNA

RNA aminoacylation in vitro selected RNA

RNA alkylation in vitro selected RNA

Amide bond formation in vitro selected RNA

Amide bond cleavage in vitro selected RNA

Glycosidic bond formation in vitro selected RNA

Porphyrin metalation in vitro selected RNA

Figure 6–98 An RNA molecule thatfolds into two different ribozymes.This 88-nucleotide RNA, created in thelaboratory, can fold into a ribozyme thatcarries out a self-ligation reaction (left) ora self-cleavage reaction (right). The ligationreaction forms a 2¢,5¢ phosphodiesterlinkage with the release of pyrophosphate.This reaction seals the gap (gray shading),which was experimentally introduced intothe RNA molecule. In the reaction carriedout by the HDV fold, the RNA is cleavedat this same position, indicated by thearrowhead.This cleavage resembles thatused in the life cycle of HDV, a hepatitis Bsatellite virus, hence the name of the fold.Each nucleotide is represented by acolored dot, with the colors used simply toclarify the two different folding patterns.The folded structures illustrate thesecondary structures of the tworibozymes with regions of base-pairingindicated by close oppositions of thecolored dots. Note that the two ribozymefolds have no secondary structure incommon. (Adapted from E.A. Schultes and D.P. Bartel, Science 289:448–452, 2000.)

ppp

2¢HO3¢

Ligase fold HDV fold

Page 73: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

Certain catalytic activities would have had a cardinal importance in the earlyevolution of life. Consider in particular an RNA molecule that helps to catalyzethe process of templated polymerization, taking any given RNA molecule as atemplate. (This ribozyme activity has been directly demonstrated in vitro, albeitin a rudimentary form that can only synthesize moderate lengths of RNA.) Sucha molecule, by acting on copies of itself, can replicate. At the same time, it canpromote the replication of other types of RNA molecules in its neighborhood(Figure 6–99). If some of these neighboring RNAs have catalytic actions that helpthe survival of RNA in other ways (catalyzing ribonucleotide production, forexample), a set of different types of RNA molecules, each specialized for a dif-ferent activity, may evolve into a cooperative system that replicates with unusu-ally great efficiency.

One of the crucial events leading to the formation of effective self-replicat-ing systems must have been the development of individual compartments. Forexample, a set of mutually beneficial RNAs (such as those of Figure 6–99) couldreplicate themselves only if all the RNAs were to remain in the neighborhood ofthe RNA that is specialized for templated polymerization. Moreover, if theseRNAs were free to diffuse among a large population of other RNA molecules,they could be co-opted by other replicating systems, which would then competewith the original RNA system for raw materials. Selection of a set of RNAmolecules according to the quality of the self-replicating systems they generatedcould not occur efficiently until some form of compartment evolved to containthem and thereby make them available only to the RNA that had generatedthem. An early, crude form of compartmentalization may have been simpleadsorption on surfaces or particles.

The need for more sophisticated types of containment is easily fulfilled by aclass of small molecules that has the simple physicochemical property of beingamphipathic, that is, consisting of one part that is hydrophobic (water insolu-ble) and another part that is hydrophilic (water soluble). When such moleculesare placed in water they aggregate, arranging their hydrophobic portions asmuch in contact with one another as possible and their hydrophilic portions incontact with the water. Amphipathic molecules of appropriate shape sponta-neously aggregate to form bilayers, creating small closed vesicles whose aqueouscontents are isolated from the external medium (Figure 6–100). The phe-nomenon can be demonstrated in a test tube by simply mixing phospholipidsand water together: under appropriate conditions, small vesicles will form. Allpresent-day cells are surrounded by a plasma membrane consisting of amphi-pathic molecules—mainly phospholipids—in this configuration; we discussthese molecules in detail in Chapter 10.

Presumably, the first membrane-bounded cells were formed by the sponta-neous assembly of a set of amphipathic molecules, enclosing a self-replicatingmixture of RNA (or pre-RNA) and other molecules. It is not clear at what point inthe evolution of biological catalysts this first occurred. In any case, once RNAmolecules were sealed within a closed membrane, they could begin to evolve inearnest as carriers of genetic instructions: they could be selected not merely onthe basis of their own structure, but also according to their effect on the othermolecules in the same compartment. The nucleotide sequences of the RNAmolecules could now be expressed in the character of a unitary living cell.

How Did Protein Synthesis Evolve?

The molecular processes underlying protein synthesis in present-day cells seeminextricably complex. Although we understand most of them, they do not makeconceptual sense in the way that DNA transcription, DNA repair, and DNA

THE RNA WORLD AND THE ORIGINS OF LIFE 371

Figure 6–99 A family of mutuallysupportive RNA molecules, onecatalyzing the reproduction of theothers.

Figure 6–100 Formation of membrane by phospholipids. Becausethese molecules have hydrophilic heads and lipophilic tails, they alignthemselves at an oil/water interface with their heads in the water and theirtails in the oil. In the water they associate to form closed bilayer vesicles inwhich the lipophilic tails are in contact with one another and the hydrophilicheads are exposed to the water.

OIL

phospholipidmonolayer

OILWATER

phospholipidbilayer

Page 74: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

replication do. It is especially difficult to imagine how protein synthesis evolvedbecause it is now performed by a complex interlocking system of protein andRNA molecules; obviously the proteins could not have existed until an early ver-sion of the translation apparatus was already in place. Although we can onlyspeculate on the origins of protein synthesis and the genetic code, several exper-imental approaches have provided possible scenarios.

In vitro RNA selection experiments of the type summarized previously inFigure 6–97 have produced RNA molecules that can bind tightly to amino acids.The nucleotide sequences of these RNAs often contain a disproportionately highfrequency of codons for the amino acid that is recognized. For example, RNAmolecules that bind selectively to arginine have a preponderance of Arg codonsand those that bind tyrosine have a preponderance of Tyr codons. This correla-tion is not perfect for all the amino acids, and its interpretation is controversial,but it raises the possibility that a limited genetic code could have arisen from thedirect association of amino acids with specific sequences of RNA, with RNAsserving as a crude template to direct the non-random polymerization of a fewdifferent amino acids. In the RNA world described previously, any RNA thathelped guide the synthesis of a useful polypeptide would have a great advantagein the evolutionary struggle for survival.

In present-day cells, tRNA adaptors are used to match amino acids tocodons, and proteins catalyze tRNA aminoacylation. However, ribozymes createdin the laboratory can perform specific tRNA aminoacylation reactions, so it isplausible that tRNA-like adaptors could have arisen in an RNA world. Thisdevelopment would have made the matching of “mRNA” sequences to aminoacids more efficient, and it perhaps allowed an increase in the number of aminoacids that could be used in templated protein synthesis.

Finally, the efficiency of early forms of protein synthesis would be increaseddramatically by the catalysis of peptide bond formation. This evolutionarydevelopment presents no conceptual problem since, as we have seen, this reac-tion is catalyzed by rRNA in present-day cells. One can envision a crude peptidyltransferase ribozyme, which, over time, grew larger and acquired the ability toposition charged tRNAs accurately on RNA templates—leading eventually to themodern ribosome. Once protein synthesis evolved, the transition to a protein-dominated world could proceed, with proteins eventually taking over themajority of catalytic and structural tasks because of their greater versatility, with20 rather than 4 different subunits.

All Present-day Cells Use DNA as Their Hereditary Material

The cells of the RNA world would presumably have been much less complex andless efficient in reproducing themselves than even the simplest present-daycells, since catalysis by RNA molecules is less efficient than that by proteins.They would have consisted of little more than a simple membrane enclosing aset of self-replicating molecules and a few other components required to pro-vide the materials and energy for their replication. If the evolutionary specula-tions about RNA outlined above are correct, these early cells would also havediffered fundamentally from the cells we know today in having their hereditaryinformation stored in RNA rather than in DNA (Figure 6–101).

Evidence that RNA arose before DNA in evolution can be found in the chem-ical differences between them. Ribose, like glucose and other simple carbohy-drates, can be formed from formaldehyde (HCHO), a simple chemical which isreadily produced in laboratory experiments that attempt to simulate conditionson the primitive Earth. The sugar deoxyribose is harder to make, and in present-day cells it is produced from ribose in a reaction catalyzed by a protein enzyme,suggesting that ribose predates deoxyribose in cells. Presumably, DNA appearedon the scene later, but then proved more suitable than RNA as a permanentrepository of genetic information. In particular, the deoxyribose in its sugar-phosphate backbone makes chains of DNA chemically more stable thanchains of RNA, so that much greater lengths of DNA can be maintained with-out breakage.

372 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

RNA-based systems

pre-RNA-based systems

RNA

DNA

EVOLUTION OF RNAs THATCAN DIRECT PROTEIN SYNTHESIS

pre-RNA

REPLACEMENT OF PRE-RNA BY RNA

RNA and protein-based systems

EVOLUTION OF NEW ENZYMESTHAT REPLICATE DNA AND MAKE RNA COPIES FROM IT

present-day cells

RNA

RNA

protein

protein

Figure 6–101 The hypothesis thatRNA preceded DNA and proteins inevolution. In the earliest cells, pre-RNAmolecules would have had combinedgenetic, structural, and catalytic functionsand these functions would have graduallybeen replaced by RNA. In present-daycells, DNA is the repository of geneticinformation, and proteins perform the vastmajority of catalytic functions in cells.RNA primarily functions today as a go-between in protein synthesis, althoughit remains a catalyst for a number ofcrucial reactions.

Page 75: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

The other differences between RNA and DNA—the double-helical structureof DNA and the use of thymine rather than uracil—further enhance DNA stabil-ity by making the many unavoidable accidents that occur to the molecule mucheasier to repair, as discussed in detail in Chapter 5 (see pp. 269–272).

Summary

From our knowledge of present-day organisms and the molecules they contain, itseems likely that the development of the directly autocatalytic mechanisms funda-mental to living systems began with the evolution of families of molecules that couldcatalyze their own replication. With time, a family of cooperating RNA catalystsprobably developed the ability to direct synthesis of polypeptides. DNA is likely tohave been a late addition: as the accumulation of additional protein catalystsallowed more efficient and complex cells to evolve, the DNA double helix replacedRNA as a more stable molecule for storing the increased amounts of genetic infor-mation required by such cells.

THE RNA WORLD AND THE ORIGINS OF LIFE 373

ReferencesGeneralGesteland RF, Cech TR & Atkins JF (eds) (1999) The RNA World, 2nd edn.

Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press.Hartwell L, Hood L, Goldberg ML et al. (2000) Genetics: from Genes to

Genomes. Boston: McGraw Hill.Lewin B (2000) Genes VII. Oxford: Oxford University Press.Lodish H, Berk A, Zipursky SL et al. (2000) Molecular Cell Biology, 4th edn.

New York:WH Freeman.Stent GS (1971) Molecular Genetics: An Introductory Narrative. San Fran-

cisco:WH Freeman.Stryer L (1995) Biochemistry, 4th edn. New York:WH Freeman.Watson JD, Hopkins NH, Roberts JW et al. (1987) Molecular Biology of the

Gene, 4th edn. Menlo Park, CA: Benjamin/Cummings.

From DNA to RNABerget SM, Moore C & Sharp PA (1977) Spliced segments at the 5¢ termi-

nus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. USA 74, 3171–3175.Black DL (2000) Protein diversity from alternative splicing: a challenge for

bioinformatics and post-genome biology. Cell 103, 367–370.Brenner S, Jacob F & Meselson M (1961) An unstable intermediate carry-

ing information from genes to ribosomes for protein synthesis. Nature190, 576–581.

Cech TR (1990) Nobel lecture. Self-splicing and enzymatic activity of anintervening sequence RNA from Tetrahymena. Biosci. Rep. 10, 239–261.

Chow LT, Gelinas RE, Broker TR et al. (1977) An amazing sequence arrange-ment at the 5¢ ends of adenovirus 2 messenger RNA. Cell 12, 1–8.

Conaway JW, Shilatifard A, Dvir A & Conaway RC (2000) Control of elon-gation by RNA polymerase II. Trends Biochem. Sci. 25, 375–380.

Cramer P, Bushnell DA, Fu J et al. (2000) Architecture of RNA polymeraseII and implications for the transcription mechanism. Science 288,640–649.

Crick F (1979) Split genes and RNA splicing. Science 204, 264–271.Daneholt B (1997) A look at messenger RNP moving through the nuclear

pore. Cell 88, 585–588.Darnell JE, Jr. (1985) RNA. Sci. Am. 253(4), 68–78.Dvir A, Conaway JW & Conaway RC (2001) Mechanism of transcription

initiation and promoter escape by RNA polymerase II. Curr. Opin. Genet.Dev. 11, 209–214.

Ebright RH (2000) RNA polymerase: structural similarities between bacte-rial RNA polymerase and eukaryotic RNA polymerase II. J. Mol. Biol. 304,687–698.

Eddy SR (1999) Noncoding RNA genes. Curr. Opin. Genet. Dev. 9, 695–699.

Green MR (2000) TBP-associated factors (TAFIIs): multiple, selective tran-scriptional mediators in common complexes. Trends Biochem. Sci. 25,59–63.

Harley CB & Reynolds RP (1987) Analysis of E. coli promoter sequences.Nucleic Acids Res. 15, 2343–2361.

Hirose Y & Manley JL (2000) RNA polymerase II and the integration ofnuclear events. Genes Dev. 14, 1415–1429.

Kadonaga JT (1998) Eukaryotic transcription: an interlaced network oftranscription factors and chromatin-modifying machines. Cell 92,307–313.

Lewis JD & Tollervey D (2000) Like attracts like: getting RNA processingtogether in the nucleus. Science 288, 1385–1389.

Lisser S & Margalit H (1993) Compilation of E. coli mRNA promotersequences. Nucleic Acids Res. 21, 1507–1516.

Minvielle-Sebastia L & Keller W (1999) mRNA polyadenylation and itscoupling to other RNA processing reactions and to transcription. Curr.Opin. Cell Biol. 11, 352–357.

Mooney RA & Landick R (1999) RNA polymerase unveiled. Cell 98,687–690.

Olson MO, Dundr M & Szebeni A (2000) The nucleolus: an old factory withunexpected capabilities. Trends Cell Biol. 10, 189–196.

Proudfoot N (2000) Connecting transcription to messenger RNA pro-cessing. Trends Biochem. Sci. 25, 290–293.

Reed R (2000) Mechanisms of fidelity in pre-mRNA splicing. Curr. Opin. CellBiol. 12, 340–345.

Roeder RG (1996) The role of general initiation factors in transcription byRNA polymerase II. Trends Biochem. Sci. 21, 327–335.

Shatkin AJ & Manley JL (2000) The ends of the affair : capping andpolyadenylation. Nat. Struct. Biol. 7, 838–842.

Smith CW & Valcarcel J (2000) Alternative pre-mRNA splicing: the logic ofcombinatorial control. Trends Biochem. Sci. 25, 381–388.

Staley JP & Guthrie C (1998) Mechanical devices of the spliceosome:motors, clocks, springs, and things. Cell 92, 315–326.

Tarn WY & Steitz JA (1997) Pre-mRNA splicing: the discovery of a newspliceosome doubles the challenge. Trends Biochem. Sci. 22, 132–137.

von Hippel PH (1998) An integrated model of the transcription complexin elongation, termination, and editing. Science 281, 660–665.

From RNA to ProteinAbelson J, Trotta CR & Li H (1998) tRNA splicing. J. Biol. Chem. 273,

12685–12688.Anfinsen CB (1973) Principles that govern the folding of protein chains. Sci-

ence 181, 223–230.

Page 76: HOW CELLS READ THE FROM DNA TO RNA GENOME: FROM THE …mcb.berkeley.edu/courses/mcb110/Albertsch06.pdf · HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN 6 FROM DNA TO RNA FROM RNA

374 Chapter 6 : HOW CELLS READ THE GENOME: FROM DNA TO PROTEIN

Cohen FE (1999) Protein misfolding and prion diseases. J. Mol. Biol. 293,313–320.

Crick FHC (1966) The genetic code: III. Sci. Am. 215(4), 55–62.Fedorov AN & Baldwin TO (1997) Cotranslational protein folding. J. Biol.

Chem. 272, 32715–32718.Frank J (2000) The ribosome—a macromolecular machine par excellence.

Chem. Biol. 7, R133–141.Green R (2000) Ribosomal translocation: EF-G turns the crank. Curr. Biol.

10, R369–373.Hartl FU (1996) Molecular chaperones in cellular protein folding. Nature

381, 571–580.Hershko A, Ciechanover A & Varshavsky A (2000) The ubiquitin system.

Nat. Med. 6, 1073–1081.Ibba M & Soll D (2000) Aminoacyl-tRNA synthesis. Annu. Rev. Biochem. 69,

617–650.Kozak M (1999) Initiation of translation in prokaryotes and eukaryotes.

Gene 234, 187–208.Nissen P, Hansen J, Ban N et al. (2000) The structural basis of ribosome

activity in peptide bond synthesis. Science 289, 920–930.Nureki O,Vassylyev DG,Tateno M et al. (1998) Enzyme structure with two

catalytic sites for double-sieve selection of substrate. Science 280,578–582.

Prusiner SB (1998) Nobel lecture. Prions. Proc. Natl. Acad. Sci. USA 95,13363–13383.

Rich A & Kim SH (1978) The three-dimensional structure of transfer RNA.Sci. Am. 238(1), 52–62.

Sachs AB & Varani G (2000) Eukaryotic translation initiation: there are (atleast) two side to every story. Nat. Struct. Biol. 7, 356–361.

The Genetic Code. (1966) Cold Spring Harbor Symposium on Quantita-tive Biology, vol XXXI. Cold Spring Harbor, NY: Cold Spring Harbor Lab-oratory Press.

Turner GC & Varshavsky A (2000) Detecting and measuring cotranslation-al protein degradation in vivo. Science 289, 2117–2120.

Varshavsky A,Turner G, Du F et al. (2000) The ubiquitin system and the N-end rule pathway. Biol. Chem. 381, 779–789.

Voges D, Zwickl P & Baumeister W (1999) The 26S proteasome: a molec-ular machine designed for controlled proteolysis. Annu. Rev. Biochem. 68,1015–1068.

Wilson KS & Noller HF (1998) Molecular movement inside the transla-tional engine. Cell 92, 337–349.

Wimberly BT, Brodersen DE, Clemons WM et al. (2000) Structure of the30S ribosomal subunit. Nature 407, 327–339.

Yusupov MM, Yusupova GZ, Baucom A et al. (2001) Crystal structure ofthe ribosome at 5.5A resolution. Science 292, 883–896.

The RNA World and the Origins of LifeBartel DP & Unrau PJ (1999) Constructing an RNA world. Trends Cell Biol.

9, M9–M13.Joyce GF (1992) Directed molecular evolution. Sci. Am. 267(6), 90–97.Knight RD & Landweber LF (2000) The early evolution of the genetic code.

Cell 101, 569–572.Orgel L (2000) Origin of life. A simpler nucleic acid. Science 290,

1306–1307.Szathmary E (1999) The origin of the genetic code: amino acids as cofac-

tors in an RNA world. Trends Genet. 15, 223–229.