30
Essentials of Medical Genomics Second Edition Stuart M. Brown NYU School of Medicine New York, NY with Contributions by John G. Hay and Harry Ostrer A John Wiley & Sons, Inc., Publication

Essentials of Medical Genomics - Startseite · up on me. High-throughput DNA sequencing is the kind of dis-ruptive technology that enables new kinds of scientific research. The cost

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    Essentials ofMedical Genomics

    Second Edition

    Stuart M. BrownNYU School of Medicine

    New York, NY

    with Contributions by

    John G. Hay and Harry Ostrer

    A John Wiley & Sons, Inc., Publication

    InnodataFile Attachment9780470334492.jpg

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    Essentials ofMedical Genomics

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    Essentials ofMedical Genomics

    Second Edition

    Stuart M. BrownNYU School of Medicine

    New York, NY

    with Contributions by

    John G. Hay and Harry Ostrer

    A John Wiley & Sons, Inc., Publication

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    Copyright C© 2009 by John Wiley & Sons, Inc. All rights reserved.

    Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley’sglobal Scientific, Technical, and Medical business with Blackwell Publishing.

    Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada

    No part of this publication may be reproduced, stored in a retrieval system, ortransmitted in any form or by any means, electronic, mechanical, photocopying,recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the1976 United States Copyright Act, without either the prior written permission of thePublisher, or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923,978-750-8400, fax 978-750-4470, or on the web at www.copyright.com. Requeststo the Publisher for permission should be addressed to the Permissions Department,John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, 201-748-6011,fax 201-748-6008, or online at http://www.wiley.com/go/permission.

    Limit of Liability/Disclaimer of Warranty: While the publisher and author have usedtheir best efforts in preparing this book, they make no representations or warrantieswith respect to the accuracy or completeness of the contents of this book andspecifically disclaim any implied warranties of merchantability or fitness for aparticular purpose. No warranty may be created or extended by sales representativesor written sales materials. The advice and strategies contained herein may not besuitable for your situation. You should consult with a professional whereappropriate. Neither the publisher nor author shall be liable for any loss of profitor any other commercial damages, including but not limited to special, incidental,consequential, or other damages.

    For general information on our other products and services or for technical support,please contact our Customer Care Department within the United States at 877-762-2974,outside the United States at 317-572-3993 or fax 317-572-4002.

    Wiley also publishes its books in a variety of electronic formats. Some content thatappears in print may not be available in electronic formats. For more information aboutWiley products, visit our web site at www.wiley.com.

    Library of Congress Cataloging-in-Publication Data:

    Essentials of medical genomics / Stuart M. Brown ; with contributionsby John G. Hay and Harry Ostrer.

    p. ; cm.Includes bibliographical references and index.ISBN 978-0-470-14019-2 (cloth)

    1. Medical genetics. 2. Genomics.

    Printed in the United States of America

    10 9 8 7 6 5 4 3 2 1

    http://www.copyright.comhttp://www.wiley.com/go/permissionhttp://www.wiley.com

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    Contents

    Preface, xi

    1 Introduction to Molecular Genetics, 1

    The Principles of Inheritance, 3Genes Are Made of DNA, 10DNA Structure, 12The Central Dogma, 18References, 29

    2 Molecular Biology Technology, 31

    Cut, Copy, and Paste, 31Restriction Enzymes, 31DNA Cloning Is Copying, 33PCR Is Cloning without the Bacteria, 37DNA Sequencing, 40References, 50

    3 Genome Databases, 53

    Genome Sequencing, 53Entrez, 55BLAST, 58Genome Annotation, 59Genome Browser, 62Human Genetic Diseases, 66A System for Naming Genes, 68Model Organisms (Comparative Genomics), 69

    v

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    vi C o n t e n t s

    Sequencing Other Genomes, 74References, 77

    4 Bioinformatics Tools, 79

    Patterns and Tools, 79Sequence Comparison, 82Multiple Alignment, 86Pattern Finding, 88Phylogenetics, 94Biotechnology Exercise, 97References, 101

    5 Human Genetic Variation, 103

    Mutation, 103Single-Nucleotide Polymorphisms, 107Linkage, 110Multigene Diseases, 112Genetic Testing, 112SNP Chips, 114The HapMap Project, 115Research Uses of SNP Markers, 119Ethnicity and Genome Diversity, 120References, 124

    6 Genetic Testing for the Practitioner, 127Harry Ostrer

    Clinical Applications of Genetic Testing, 128Methods of Genetic Testing, 131Adequacy of Genetic Testing, 136Informed Consent, 137Genetic Counseling, 137Clinical Vignettes, 138References, 140

    7 Gene Therapy, 143John G. Hay

    Historical Perspective, 143Strategies of Gene Therapy, 144DNA Elements for Gene Expression, 145Gene Delivery Systems, 146

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    C o n t e n t s vii

    Targeting Gene Delivery, 160Formative Years and Initial Clinical

    Approaches, 167The Problems, 175The Future, 177References, 177

    8 Microarrays, 179

    Spotting versus Synthesis on the Chip, 182Other Types of Arrays, 187Differential Gene Expression, 188Error and Reliability, 195Evolutionary Perspectives, 197References, 198

    9 Analysis of Microarray Data, 201

    Experimental Design, 202Data Analysis Workflow, 205Functional Analysis, 215Validation, 218References, 220

    10 Pharmacogenomics and Toxicogenomics, 223

    Pharmacogenomics, 223Environmental Chemicals, 229Toxicogenomics for Drug Development, 231References, 235

    11 Clinical Research Informatics, 237

    Clinical Databases, 237Clinical Trials Management, 240Data Standards and Ontologies, 242Tissue Banks, 246Application to Medical Practice, 248References, 249

    12 RNA Interference and MicroRNAs, 251

    Antisense RNA, 252RNA Interference, 253

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    viii C o n t e n t s

    Ribozymes, 268References, 268

    13 Alternative Splicing, 271

    Exon Arrays, 280Medical Applications of Alternative

    Splicing, 282References, 285

    14 Genome Tiling Chips, 287

    Genome Chips, 287Resequencing Chips, 288Whole-Genome Transcription Profiling, 289ChIP-chip, 293ArrayCGH, 295References, 298

    15 Cancer Genomics, 301

    Understanding Cancer Genomics, 301Copy Number Mutations, 304Gene Expression Signatures, 309Cancer Genome Atlas, 313References, 316

    16 Proteomics, 319

    Protein Modifications, 320Quantitative Approaches, 321Biomarkers, 325Protein Databases, 330Protein–Protein Interactions, 331DNA-Binding Proteins, 334Structural Proteomics, 335Drug Targets, 337References, 337

    17 Consumer Genomics and Genealogy, 339

    Genealogy, 339Nutrigenomics, 347

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    C o n t e n t s ix

    Privacy Concerns, 352References, 353

    18 The Ethics of Medical Genomics, 355

    Eugenics, 356Human Genome Diversity Project and

    Population Genetics, 360Genetic Discrimination, 366Impact on Physicians and Researchers, 369Clinical Research, 374References, 376

    Appendix: Genetic Testing: ScientificBackground for Policymakers, 379Amanda K. Sarata

    Glossary, 397

    Index, 419

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    Preface

    Medical genomics might seem like a rather specialized topic, ofinterest to just a few researchers and genetics experts, but I be-lieve that it is a technology that is already having an impact onthe practice of most primary care physicians and biomedical re-searchers. Genetic tests are now in use as a diagnostic aid forvarious types of cancer and will soon be commonplace as an aidto prescribing psychiatric drugs. Some drugs are currently underdevelopment that will require a genetic test before they can beprescribed. Consumer genomics is a new development that isdisrupting the usual flow of health care information. A patientmay arrive at his or her physician’s office armed with a detailedreport on their allelic status for thousands of genetic markersthat may or may not be relevant to each health care decision.Therefore, I have tried to make this book as accessible and com-prehensive as possible in order to provide a working knowledgeof medical genomics both for biomedical professionals and con-sumers of health care.

    However, writing a book about genomics is truly a Sisypheantask, since the goal of reporting current technologies is constantlyreceding. The book writing process takes about a year from theinitial outline to page proofs, and the past year has seen excep-tionally rapid progress in genomics technologies. While I waspaying attention to genome tiling, copy number, and SNP chips,the revolution in Next-Generation DNA sequencing has snuck

    xi

  • fm JWBK238/Brown September 4, 2008 11:27 Char Count=

    xii P r e f a c e

    up on me. High-throughput DNA sequencing is the kind of dis-ruptive technology that enables new kinds of scientific research.The cost of sequencing a whole human genome has droppedfrom several million to about $100,000, and it is likely to be cutby tenfold again by 2009. New and unexpected applications forsequencing technology are being developed almost daily.

    I have included as an appendix to this book, a short re-port written by Amanda K. Sarata of the Congressional ResearchService. It is valuable not just because this report provides a nicesummary of the science that underlies genetic testing and therelated public policy issues, but because it also demonstrates thelevel of genetics information to which our Congressional Repre-sentatives have been exposed.

    The Genetic Information Nondiscrimination Act (GINA) wasfinally passed by the US Congress and signed into law by Presi-dent Bush in May of 2008. We all await the many social ramifica-tions of this legislation.

    STUART M. BROWNNew York

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    C H A P T E R

    1

    Introduction toMolecular Genetics

    The Human Genome Project is a bold undertaking to understand,at a fundamental level, all of the genetic information required tobuild and maintain a human being. The human genome is thecomplete information content of the human cell. This informa-tion is encoded in approximately 3.2 billion base pairs of DNAcontained on 46 chromosomes (22 pairs of autosomes plus thetwo sex chromosomes—see Figure 1.1). The completion, in 2001,of the first draft of the human genome sequence was only the firstphase of this project (Venter et al. 2001; Lander et al. 2001).

    To use the metaphor of a book, the draft genome sequencegives biology all of the letters, in the correct order on the pages,but without the ability to recognize words, sentences, and punc-tuation, or even an understanding of the language in which thebook is written. The task of making sense of all of this raw biologi-cal information falls, at least initially, to bioinformatics specialistswho make use of computers to find the words and decode thelanguage. The next step is to integrate all of this information intoa new form of experimental biology, known as genomics, that

    Essentials of Medical Genomics, Second Edition By Stuart M. BrownCopyright C© 2009 John Wiley & Sons, Inc.

    1

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    2 I n t r o d u c t i o n t o M o l e c u l a r G e n e t i c s

    FIGURE 1.1. Human karyotype—SKY image: available at http://www.accessexcellence.org/AB/GG/sky.gif; credit to Chroma TechnologyInc. (See insert for color representation.)

    can ask meaningful questions about what is happening in verycomplex systems where tens of thousands of different genes andproteins are interacting simultaneously.

    The primary justification for the considerable amount ofmoney spent on sequencing the human genome (from govern-ments and private corporations) is that this information will leadto dramatic medical advances. In fact, the first wave of newdrugs and medical technologies derived from genome informa-tion is currently making its way through clinical trials and intothe healthcare system. However, to effectively utilize these newadvances, medical professionals need to understand somethingabout genes and genomes. Just as it is important for physicians tounderstand how to Gram-stain and evaluate a culture of bacteria,even if they never actually perform this test themselves in their

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    T h e P r i n c i p l e s o f I n h e r i t a n c e 3

    medical practices, it is important to understand how DNA tech-nologies work in order to appreciate their strengths, weaknesses,and peculiarities.

    However, before we can discuss whole genomes and genomictechnologies, it is necessary to understand the basics of how genesfunction to control biochemical processes within the cell (molecu-lar biology) and how hereditary information is transmitted fromone generation to the next (genetics).

    The Principles of Inheritance

    The principles of genetics were first described by the monkGregor Mendel in 1866 in his observations of the inheritanceof traits in garden peas [“Versuche über Pflanzen-Hybriden”(Mendel 1866)]. Mendel described “differentiating characters”(differierende Merkmale) which may come in several forms. In hismonastery garden, he made crosses between strains of gardenpeas that had different characters, each with two alternate formsthat were easily observable, such as purple or white flower color,yellow or green seed color, smooth or wrinkled seed shape, andtall or short plant height. (These alternate forms are now knownas alleles.) Then he studied the distribution of these forms inseveral generations of offspring from his crosses.

    Mendel observed the same patterns of inheritance for each ofthese characters. Each strain, when bred with itself, showed nochanges in any of the characters. In a cross between two strainsthat differ for a single character, such as pink versus white flow-ers, the first generation of hybrid offspring (the F1) all resembledone parent—all pink. Mendel called this the dominant form ofthe character. After self-pollinating the F1 plants, the second-generation plants (the F2) showed a mixture of the two parentalforms (see Figure 1.2). This is known as segregation. The reces-sive form that was not seen in the F1s (white flowers) was foundin one-fourth (25%) of the F2 plants.

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    4 I n t r o d u c t i o n t o M o l e c u l a r G e n e t i c s

    Pinkparent

    Whiteparentx

    x

    PP

    PP

    pp

    pp

    Pp

    Pp

    Pp Pp

    Pp Pp Pp

    Pp

    Gametes:

    F1

    F1

    F2

    F1

    P P P P

    Gametes: P P P P

    FIGURE 1.2. Mendel observed a single trait segregating over two generations.Pink and white parents have all pink F1 progeny (heterozygous), but one-fourthof the F2 generation are white and three-fourths are pink.

    Mendel also made crosses between strains of peas that dif-fered for two or more traits. He found that each trait was assortedindependently in the progeny—there was no connection betweenwhether an F2 plant had the dominant or recessive form for onecharacter and which form it carried for another character (seeFigure 1.3).

    Mendel created a theoretical model (“Mendel’s laws ofgenetics”) to explain his results. He proposed that each indi-vidual has two copies of the hereditary material for each charac-ter, which may determine different forms of that character. Thesetwo copies separate and are subjected to independent assortment

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    T h e P r i n c i p l e s o f I n h e r i t a n c e 5

    Mendel: dihybrid cross

    Parentalgeneration

    F1 generation

    F2 generation

    SS YY ss yy

    Ss Yy

    SsYySSYY

    SSyY

    sSYY

    sSyY sSyy

    sSYy ssYY ssYy

    ssyY ssyy

    SSyy SsyY

    SSYy SsYY

    Ssyy

    Punnett square

    Eggs

    Sperm

    X

    X

    Gametes

    Gametes

    Self

    SY

    sY

    sYSY

    Sy

    Sy sy

    sy

    sY sy

    sy

    SY Sy sY sy

    SY

    SY Sy

    FIGURE 1.3. A cross where two independent traits are segregating (Y = yellow;S = smooth).

    during the formation of gametes (sex cells). When a new individ-ual is created by the fusion of two sex cells, the two copies fromthe two parents combine to produce a visible trait dependingon which form is dominant and which is recessive. Mendel didnot propose any physical explanation for how these traits were

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    6 I n t r o d u c t i o n t o M o l e c u l a r G e n e t i c s

    passed from parent to progeny; his characters were purely ab-stract units of heredity.

    Modern genetics has completely embraced Mendel’s modelwith some additional detail. There may be more than two differ-ent alleles for a gene in a given population, but each individualhas only two, which may be the same (homozygous) or different(heterozygous). In some cases two different alleles combine toproduce an intermediate form in heterozygous individuals, sothat red and white flower alleles may combine to produce pinkor type A and type B blood alleles, which in turn combine toproduce the AB blood type.

    Genes Are on Chromosomes

    In 1902, Walter Sutton, a microscopist, proposed that Mendel’sheritable characters resided on the chromosomes which he ob-served inside the cell nucleus (see Figure 1.4). Sutton observedthat “the association of paternal and maternal chromosomes in

    Anaphase

    FIGURE 1.4. Anaphase chromosomes in a dividing lily cell. (See insert for colorrepresentation.)

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    T h e P r i n c i p l e s o f I n h e r i t a n c e 7

    pairs and their subsequent separation during cell division . . . mayconstitute the physical basis of the Mendelian law of heredity”(Sutton 1903).

    In 1909, the Danish botanist Wilhelm Johanssen coined theterm “gene” to describe Mendel’s heritable characters. In 1910,Thomas Hunt Morgan found that a trait for white eye color waslocated on the X chromosome of the fruitfly and was inheritedtogether with a factor that determines sex (Morgan 1910). A num-ber of subsequent studies by Morgan (1919) and others showedthat each gene for a particular trait was located at a specific spotor locus on a chromosome in all individuals of a species. Thechromosome was perceived as a linear organization of genes,like beads on a string. Throughout the early part of the twentiethcentury, a gene was considered to be a single, fundamental, in-divisible unit of heredity, in much the same way as an atom wasconsidered to be the fundamental unit of matter.

    Each individual has two copies of each type of chromosome,having received one copy from each parent. The two copies ofeach chromosome in the parent are randomly divided into the sexcells (sperm and egg) in a process called segregation. It is possibleto observe the segregation of chromosomes during meiosis usingonly a moderately powerful microscope. It is an aestheticallysatisfying triumph of biology that this observed segregation ofchromosomes in cells exactly corresponds to the segregation oftraits that Mendel observed in his peas.

    Recombination and Linkage

    In the early twentieth century, Mendel’s concepts of inheritedcharacters were broadly adopted by practical plant and animalbreeders as well as experimental geneticists. It rapidly becameclear that Mendel’s experiments represented an oversimplifiedview of inheritance. He must have intentionally chosen charac-ters in his peas that were inherited independently. In the breeding

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    8 I n t r o d u c t i o n t o M o l e c u l a r G e n e t i c s

    experiments where many traits differ between parents, it is com-monly observed that progeny inherit pairs or groups of traitstogether from one parent far more frequently than would be ex-pected by chance alone. This observation fits nicely into the chro-mosome model of inheritance—if two genes are located on thesame chromosome, then they will be inherited together when thatchromosome segregates into a gamete, and that gamete becomespart of a new individual.

    However, it was also observed that “linked” genes do oc-casionally separate. A theory of recombination was developedto explain these events. During the process of meiosis, it wasproposed that the homologous chromosome pairs line up andexchange segments in a process called crossing over. This theorywas supported by microscopic evidence of X-shaped structurescalled chiasmata forming between paired homologous chromo-somes in meiotic cells (see Figure 1.5).

    If a parent cell contains two different alleles for two differentgenes, then after the crossover, the chromosomes will containnew combinations of alleles. For example, if one chromosomecontains alleles A and B for two genes, and the other chromosomecontains alleles a and b, then without crossovers, all progeny mustinherit a chromosome from that parent with either an A–B or ana–b allele combination. If a crossover occurs between the twogenes, then the resulting chromosomes will contain the A–b anda–B allele combinations (see Figure 1.6).

    Morgan, continuing his work with fruitflies, demonstratedthat the chance of a crossover occurring between any two linked

    FIGURE 1.5. Chiasmata visible in electron micrograph of meiotic chromosome.

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    T h e P r i n c i p l e s o f I n h e r i t a n c e 9

    A

    B Bb b

    a A aa

    FIGURE 1.6. Schematic diagram of a single crossover between a chromosomewith A–B alleles and a chromosome with a–b alleles to form A–b and a–Brecombinant chromosomes. (See insert for color representation.)

    genes is proportional to the distance between them on the chro-mosome. Therefore, by counting the frequency of crossovers be-tween alleles of a given pair of genes, it is possible to creategenetic maps of chromosomes. Morgan was awarded the 1933Nobel Prize in Medicine for this work. In fact, it is generallyobserved that on average there is more than one crossover be-tween every pair of homologous chromosomes in every meiosis,so that two genes located on opposite ends of a chromosomedo not appear linked at all. On the other hand, alleles of genesthat are located very close together are very rarely separated byrecombination (see Figure 1.7).

    A

    B Bb b

    c c CC

    A aa

    FIGURE 1.7. Genes A and B are tightly linked so that they are not separated byrecombination, but gene C is farther away. After recombination occurs in somemeiotic cells, gametes are produced with allele combinations ABC, abc, ABc,and abC. (See insert for color representation.)

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    10 I n t r o d u c t i o n t o M o l e c u l a r G e n e t i c s

    The relationship between the frequency of recombination be-tween alleles and the distance between genes on a chromosomehas been used to construct genetic maps for many different or-ganisms, including humans. It has been a fundamental assump-tion of genetics for almost a hundred years that recombinationsoccur randomly along the chromosome at any location, evenwithin genes. However, more recent data from DNA sequencingof genes in human populations suggest that there are recombi-nation hotspots and regions where recombination almost neveroccurs. This creates groups of alleles from neighboring genes on achromosome, known as haplotypes, that remain linked togetheracross hundreds of generations.

    Genes Encode Proteins

    Beadle and Tatum (1941) showed that a single mutation, causedby exposing the fungus Neurospora crassa to X rays, destroyedthe function of a single enzyme, which interrupted a biochemicalpathway at a specific step due to the loss of function of a particularenzyme. This mutation segregated among the progeny exactlyas Mendel’s traits did in peas. The X-ray-induced damage to aspecific region of one chromosome destroyed the instructionsfor the synthesis of a specific enzyme. Thus a gene is a spot on achromosome that codes for a single enzyme. In subsequent years,a number of other researchers broadened this concept by showingthat genes code for all types of proteins, not just enzymes, leadingto the one gene–one protein model, which is the core of modernmolecular biology. Beadle and Tatum shared the 1958 Nobel Prizein Medicine.

    Genes Are Made of DNA

    The next step in understanding the nature of the gene wasto dissect the chemical structure of the chromosome. Crude

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    G e n e s A r e M a d e o f D N A 11

    III S

    II R

    FIGURE 1.8. Transforming experiment: rough (II R) and smooth (III S) Strepto-coccus pneumoniae cells. (From Avery et al., 1944.)

    biochemical purification had shown that chromosomes are com-posed of both protein and DNA. Avery et al. (1944) conducted theclassic experiment on the “transforming principle.” They foundthat DNA purified from a lethal S (smooth) form of Streptococcuspneumoniae could transform a harmless R (rough) strain into theS form (see Figure 1.8). Treatment of the DNA with protease todestroy all of the protein had no effect, but treatment with DNA-degrading enzymes blocked the transformation. Therefore, theinformation that transforms the bacteria from R to S must becontained in the DNA (McCarty 1985).

    Hershey and Chase (1952) confirmed the role of DNA withtheir classic “blender experiment” on bacteriophage viruses. Thephage were radioactively labeled with either 35S in their proteinsor 32P in their DNA. They used a blender to interrupt the pro-cess of infection of Escherichia coli bacteria by the phage. Thenthey separated the phage from the infected bacteria by centrifu-gation and collected the phage and the bacteria separately. Theyobserved that the 35S-labeled protein remained with the phagewhile the 32P-labeled DNA was found inside the infected bacte-ria (see Figure 1.9). This proved that it is the DNA portion of thevirus that enters the bacteria and contains the genetic instructions

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    12 I n t r o d u c t i o n t o M o l e c u l a r G e n e t i c s

    Phage with 35S labeled proteins

    lnfect bacterial cells

    Remove empty phage coat proteinsusing a blender and centrifugation

    No 35S detected32P detected

    32P35S

    Phage with 32P labeled DNA

    FIGURE 1.9. Hershey–Chase blender experiment. Escherichia coli bacteria areinfected with phage with 35S-labeled proteins or 32P-labeled DNA. After re-moving the phage with a blender, the 32P-labeled DNA but not the 35S-labeledprotein, is found inside the bacteria. (From Micklos and Freyer, DNA Science,Cold Spring Harbor Press, 1990.)

    for producing new phage, not the proteins, which remain outside.Hershey was awarded the 1969 Nobel Prize for this work.

    DNA Structure

    Now it was clear that genes are made of DNA, but how doesthis chemically simple molecule contain so much information?

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    D N A S t r u c t u r e 13

    N

    N

    H CH3

    O

    HO

    O

    OH

    CH2OP

    O

    −O

    −O5'

    14

    23

    N

    N

    H

    N

    HO

    O

    OH

    CH2OP

    O

    −O

    −O5'

    H H

    12

    34

    5

    6 12

    34

    5

    6

    14

    23

    Pyrimidine nucleotides

    (a) (b)

    N

    N

    N

    O

    OH

    CH2OP

    O

    −O

    −O5'

    H H

    12

    34

    5

    6

    14

    23

    N

    N

    H

    H

    7

    98

    N

    N

    O

    O

    OH

    CH2OP

    O

    −O

    −O5'

    32

    16

    5

    4

    14

    23

    N

    N

    H

    N

    7

    98

    H

    H

    H

    Purine nucleotides

    (c) (d)

    FIGURE 1.10. Chemical structures of the four DNA bases: (a) deoxythymidinemonophosphate (dTMP); (b) deoxycytidine monophosphate (dCMP); (c) de-oxyadenosine monophosphate (dAMP); (d) deoxyguanosine monophosphate(dGMP).

    DNA is a long polymer molecule that contains a mixture offour different chemical subunits: adenine, cytosine, guanosine,and thymine (abbreviated as A, C, G, and T). These subunits,known as nucleotide bases, have similar two-part chemical struc-tures that contain a deoxyribose sugar and a nitrogen ring (seeFigure 1.10), hence the name deoxyribose nucleic acid. The realchallenge is to understand how the nucleotides fit together in away that can contain a lot of information.

    Chargaff (1950) discovered that there was a consistent one-to-one ratio of adenine to thymine and guanine to cytosine inany sample of DNA from any organism. In 1951, Linus Paulingand R. B. Corey described the α-helical structure of a protein(Pauling and Corey 1951). Shortly thereafter, Rosalind Franklin(Sayre 1975) provided X-ray crystallographic images of DNAto James Watson and Francis Crick (see Figure 1.11); this formof DNA was very similar to the α-helix described by Pauling.Watson and Crick’s crucial insight (1953) was to realize thatDNA formed a double helix with complementary bonds betweenadenine–thymine and guanine–cytosine pairs.

    The Wastson–Crick model of the DNA structure resembles atwisted ladder. The two sides of the ladder are formed by strong

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    14 I n t r o d u c t i o n t o M o l e c u l a r G e n e t i c s

    FIGURE 1.11. Rosalind Franklin’s X-ray diffraction image of DNA.

    covalent bonds between the phosphate on the 5′ carbon of onedeoxyribose sugar and the methyl side groups of the 3′ carbon ofthe next (a phosphodiester bond). Thus, the deoxyribose sugarpart of each nucleotide is bonded to the one above and below it,forming a chain that forms the backbone of the DNA molecule(see Figure 1.12). The phosphate-to-methyl linkage of the deoxyri-bose sugars give the DNA chain a direction or polarity, generallyreferred to as 5′ to 3′. Each DNA molecule contains two parallelchains that run in opposite directions forming the sides of theladder.

    The rungs of the ladder are formed by weaker hydrogenbonds between the nitrogen ring parts of pairs of nucleotidebases. There are only two types of base pair bonds: adenine bondswith thymine, and guanine bonds with cytosine. The order ofnucleotide bases on both sides of the ladder always reflects thiscomplementary base pairing—so that wherever there is an A onone side, there is always a T on the other side, and vice versa.

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    D N A S t r u c t u r e 15

    C H

    C

    H

    CH2

    O H

    H CC

    C

    HO

    O

    PO O−

    O−

    PO

    O

    O−

    CH2

    5'

    4'

    5'

    C H

    G

    H

    O H

    H CC

    C

    HO

    4'

    3' 2'

    P

    O

    CH2

    C H

    T

    H

    OH H

    H CC

    C

    HO

    O O−

    5'

    4'

    3'

    G

    C H

    CH2

    OHH

    HC C

    C

    HO

    H

    O

    P

    O

    −O

    C H

    CH2

    H

    HC C

    C

    HO

    C

    O

    P O−O

    O

    CH

    A

    H

    CH2

    H

    HC C

    C

    HO

    O

    P O−O

    −O

    5'

    1'

    2' 3'

    5'

    4'

    3'2'

    5'

    4'

    3'2'

    1'

    5' end 3' end

    3' end 5' end

    3' 2'

    2'

    1'

    1'

    H1'

    4'

    O

    FIGURE 1.12. DNA phosphate bonds.

    Since the A–T and G–C units always occur together, they are oftenreferred to as base pairs. The G–C base pair has three hydrogenbonds, while the A–T pair only has two (see Figure 1.13), so thebonds between G–C bases are more stable at high temperaturesthan are A–T bonds. The nucleotide bases are strung togetheron the polydeoxyribose backbone-like beads on a string. It is theparticular order of the four different bases as they occur alongthe string that contains all of the biological information.

    Watson and Crick realized that this model of DNA struc-ture contains many implications (see Figure 1.14). First, the twostrands of the double helix are complementary, not identical.

  • c01 JWBK238/Brown September 5, 2008 19:38 Char Count=

    16 I n t r o d u c t i o n t o M o l e c u l a r G e n e t i c s

    C

    NC

    N

    CC

    C

    H

    H

    O

    Sugar

    ON

    C

    CN

    C

    C

    NH H

    NC

    N

    HHH

    Thymine

    H

    (a) (b)

    H

    Adenine

    Sugar

    1

    65

    4

    32

    δ−δ+

    δ+ δ− 789

    C

    NC

    N

    CC

    H H

    H

    HSugar

    O

    O

    NC

    CN

    C

    C

    H

    NC

    N

    Cytosine

    N

    N

    H

    Guanine

    Sugar

    1

    65

    4

    32

    δ+

    δ−

    δ+δ− 7

    89

    H

    H

    12 3 4

    56

    δ−δ+

    12 3

    6 54

    FIGURE 1.13. DNA hydrogen bonds in (a) A–T and (b) G–C base pairs.

    Thus one strand can serve as a template for the synthesis of anew copy of the other strand—a T is added to the new strandwherever there is an A, a G for each C, and so on—perfectlyretaining the information in the original double strand. In 1953,in a single-page paper in the journal Nature, they said, with amastery of understatement: “It has not escaped our attention thatthe specific pairing we have postulated immediately suggests apossible copying mechanism for the genetic material” (Watsonand Crick 1953).

    So, in one tidy theory, the chemical structure of DNA explainshow genetic information is stored on the chromosome and howit is passed on when cells divide. That is why Watson and Crickwon the 1962 Nobel Prize (shared with Maurice Wilkins).

    If the two complementary strands of a DNA molecule areseparated in the laboratory by boiling (known as denaturingthe DNA), then they can find each other and again pair up, byre-forming the complementary A–T and C–G hydrogen bonds(annealing). Bits of single-stranded DNA from different genesdo not have perfectly complementary sequences, so they will notpair up in solution. This process of separating and rematchingcomplementary pieces of DNA, known as DNA hybridization,is a fundamental principle behind many different molecular bi-ology technologies.