Chapter 10 Final ADA v2 - ezto-cf-media.mheducation.com

10-1

Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

chapter 10

Digital Analysis of DNA

Synopsis The method of genome analysis described in Chapter 10 is at its heart very simple: Scientists chop up the DNA constituting a large genome into bite-sized pieces, using restriction enzymes or by mechanical shearing. The researchers then purify large amounts of individual bite-sized pieces by making and amplifying recombinant DNA molecules through the process of molecular cloning. The next step is to determine the DNA sequence of many bite-sized pieces, and then to use computer programs that recognize overlapping DNA sequences to reconstruct the sequence of the original whole genome. The job is still far from done even after the genome sequence data are available. The sequence by itself is not very useful until it is annotated; geneticists want to identify the various genes and other elements (like centromeres and telomeres) that allow the genome to function. That process is the subject of Chapter 11.

Key terms genome – the DNA contained within all the chromosomes of an organism. Genomics is

the study of genomes.

Human Genome Project – the collaborative effort to determine the sequence of all the nucleotides in the human genome and to annotate these sequences according to their functions.

restriction enzyme – an enzyme used by bacteria to attack the genomes of bacteriophage by cutting foreign DNA at specific sequences, producing smaller, linear restriction fragments. Bacteria protect their own genomes from restriction enzymes by making modification enzymes that add methyl groups to specific nucleotides so that restriction sites are no longer recognized by the restriction enzymes.

sticky ends versus blunt ends – If a restriction enzyme cuts the two strands of DNA at phosphodiester bonds between the same base pairs, the ends of the restriction fragments are blunt. If the cuts are offset from each other, the restriction fragments have either 5′ or 3′ overhangs (depending on the enzyme). If the overhangs of any two fragments produced by an enzyme can base-pair with each other, the ends are sticky and compatible.

mechanical shearing – breaking DNA at random locations by passing a DNA solution forcefully through a thin needle or by applying ultrasound energy.

chapter 10

10-2


gel electrophoresis – separating molecules according to their size by using an electric field to drive their migration through pores in a gel made of agarose or acrylamide. Smaller molecules move faster through the gel than large molecules because they can pass through more of the pores, so their path is less circuitous and shorter than that of larger molecules.

molecular cloning – isolation of large amounts of a given fragment of DNA that is joined in the test tube with a vector DNA, allowing propagation of the resultant recombinant DNA molecule. A preparation containing many identical copies of the same one recombinant DNA molecule is a DNA clone.

plasmid – a small circular DNA molecule that can replicate within bacterial cells independently of the bacterial chromosome. Plasmid DNAs are often used as vectors to construct recombinant DNA molecules with relatively small inserts (<20 kb) of foreign DNA.

selectable marker – gene carried on a vector that allows the selection of cells that contain recombinant DNA molecules.

BACs and YACs – Bacterial artificial chromosome (BACs ) and yeast artificial chromosomes (YACs ) are vectors that allow the construction of recombinant DNA molecules with very large inserts of foreign DNA (hundreds to thousands of kb).

hybridization/annealing – the process by which single strands of DNA or RNA (such as a template and a primer) can form double-stranded molecules by base complementarity.

Sanger sequencing – a method invented by Frederick Sanger for determining the nucleotide sequence of a DNA molecule by the incorporation of fluorescently-labeled dideoxyribonucleotide triphosphates (ddNTPs) into a newly polymerized chain.

read – a digital file containing the nucleotide sequence determined from one molecule of DNA (or the many identical copies of this molecule found in a DNA clone).

whole-genome shotgun strategy – a method to sequence large genomes by determining a very large number of small, random sequences and then using the computer to assemble these into the sequence of the genome. Because genomes contain many repeated sequences, paired-end sequencing, a method where only the two ends of random clones are sequenced, is necessary to assemble the sequences of genomes when they are determined for the first time.

Problem Solving Restriction enzyme recognition sites: When examining restriction enzyme recognition sites in DNA, you should draw out the site on both strands with careful attention to the 5′-to-3′ polarity, and then draw out the ends of the resultant molecule(s) after digestion with the restriction enzyme. If the cut DNAs represent the vector and insert being used to make a recombinant DNA molecule, draw out how the ends would go together in both strands, and look for complementarity and the nature of the resultant sequence after ligation.

chapter 10

10-3


The likelihood of finding a specific restriction site in DNA of random sequence depends on the length of the recognition site. A 4-bp recognition site is found once in every (4)4 = 256 bp, while a 6-bp site is found once in every (4)6 = 4096 bp. Remember that an N (any base pair) has a chance of 1 at any location, while a Y (pyrimidine) or R (purine) has a chance of 1/2 at any location.

To determine if a restriction enzyme cleavage produces blunt ends or ends with overhangs, ask whether the cleavage sites are at the same position on the two strands (blunt) or whether they are staggered with respect to each other (5′ or 3′ overhangs). Single-stranded overhangs will be sticky if the original site and the positions of the two cleavages exhibit rotational symmetry.

Vectors and the formation of recombinant DNA molecules: Remember that vectors must have (i) an origin of replication allowing their DNA to be copied independently of the host cell genome, and (ii) a selectable marker so that researchers can identify cells containing the recombinant DNAs. Most vectors also have one or more restriction enzyme sites into which the foreign DNA insert can be added to the vector; these restriction enzyme sites cannot interrupt either the origin of replication or the selectable marker. Sanger DNA sequencing: Sequence traces such as that shown in Fig. 10.7f depict the newly synthesized molecules in which nucleotides are added successively onto the 3′ end of the primer due to complementarity with nucleotides in the template. The result of the sequencing reaction is a nested set of new DNA molecules. In all molecules in the nested set, the 5′ end is the 5′ end of the primer, while the 3′ end is a fluorescently labeled dideoxyribonucleotide. Vocabulary

1. a. oligonucleotide 10. a short DNA fragment that can be synthesized by a machine

b. vector 12. a DNA molecule used for transporting, replicating, and purifying a DNA fragment

c. sticky ends 9. short single-stranded sequences found at the ends of many restriction fragments

d. recombinant DNA 7. contains genetic material from two different organisms

e. ddNTPs 11. DNA chain-terminating subunits

f. genomic library 2. a collection of the DNA fragments of a given species, inserted into a vector

g. genomic equivalent 8. the number of DNA fragments that are sufficient in aggregate length to contain the entire genome of a specified organism

h. gel electrophoresis 5. method for separating DNA molecules by size

chapter 10

10-4


i. selectable marker 1. gene in a vector that enables isolation of transformants

j. hybridization 4. stable binding of single-stranded DNA molecules to each other

k. primer 6. oligonucleotide extended by DNA polymerase during replication

l. polylinker 3. synthetic DNA element in a cloning vector with unique restriction sites used for insertion of foreign DNA

Section 10.1

2. a. Sau3A (^GATC) (i) Sau3A recognition sites are 4 base pairs (bp) long and are expected to occur randomly every 44 or 256 bp. The human genome contains about 3 ´ 109 bp, one would expect 3 × 109/256 = 1.2 × 107 ~12,000,000 fragments. (ii) The enzyme would cut on average once every 44 or 256 base pairs, so this should be the size of the average piece of the human genome cut with this restriction enzyme. (iii) Ends produced by cleavage with this enzyme have sticky ends with a 5′ overhang. Note that the cut site (^) is not in the middle of the recognition sequence, meaning that the enzyme cuts the two strands of the DNA target at different base pairs, so this is not a blunt end. Read 5′-to-3′, the cut site is to the left of the recognition sequence’s middle, so this is a 5′ overhang. (iv) All the ends of human DNA fragments produced by digestion with Sau3A would be identical (they are all 5′ GATC overhangs).

b. BamHI (G^GATCC) (i) The enzyme would cut random DNA once every 46 or 4,096 bp. 3 × 109/4,096 = ~732,400 fragments. (ii) 4,096 base pairs = ~4.1 kilobase pairs (kb). (iii) Sticky ends with a 5′ overhang. (iv) All ends are identical (they are all 5′ GATC overhangs).

c. Hpa II (C^CGG) (i) 3 × 109/256 = ~12,000,000 fragments. (ii) 256 base pairs. (iii) Sticky ends with a 5′ overhang. (iv) All ends are identical (these are 5′ CG overhangs).

d. SphI (GCATG^C) (i) 3 × 109/4,096 = ~732,400 fragments. (ii) 4,096 base pairs. (iii) Sticky ends with a 3′ overhang. The cut site is not in the middle of the recognition sequence, so this is not a blunt end. However, in this case the cut site is to the right of the middle of the recognition site, so this is a 3′ overhang. (iv) All ends are identical (these are CATG 3′ overhangs).

e. Nae I (GCC^GGC) (i) 3 × 109/4,096 = ~732,400 fragments. (ii) 4,096 base pairs. (iii) Blunt ends. Note that the cut site is in the middle of the recognition sequence, and that the restriction enzyme cuts both strands at the same place. (iv) All ends are identical.

f. Ban I (G^GYRCC) (i) In this case, the restriction fragment would be expected to cut DNA once every (4)(4)(2)(2)(4)(4) = 1,024 bp. (This is because Y bases could be either C or T, while R bases could be A or G). Thus, 3 × 109/1,024 = ~2,930,000 fragments.

chapter 10

10-5


(ii) 1,024 base pairs. (iii) Sticky ends with a 5′ overhang. (iv) All ends are NOT identical. After digestion, some resultant sticky ends will be 5′ GCAC while others will be 5′ GCGC, 5′ GTAC, or 5′ GTGC overhangs.

g. Bst YI (R^GATCY) (i) 3 × 109/1024 = ~2,930,000 fragments. The 1024 value is calculated as in part (f). (ii) 1024 base pairs. (iii) Sticky ends with a 5′ overhang. (iv) All ends are identical. In contrast with part (f), the actual overhangs produced with this restriction enzyme will all be 5′ GATC.

h. Bsl I (CCNNNNN^NNGG) (i) 3 × 109/256 = ~12,000,000 fragments. Even though this recognition sequence is long, only 4 base pairs are defined, while all the others can be any base pair. So the chance of any random 11 base pair sequence being recognized by this enzyme is (1/4)(1/4)(1)(1)(1)(1)(1)(1)(1)(1/4)(1/4) = 1/256. (ii) 256 base pairs. (iii) Sticky ends with a 3′ overhang. (iv) All ends are NOT identical, the 3′ overhangs would be 3 bases long but they could be any 3 bases.

i. Sbf I (CCTGCA^GG) (i) This recognition site is 8 bp long, so the enzyme would cut on average every 48 or 65,536 bp. 3 × 109/65,536 = ~45,800 fragments. (ii) 65,536 base pairs = 65 kb. (iii) Sticky ends with a 3′ overhang. (iv) All ends are identical and would be TGCA 3′.

3. a. Sau3A (^GATC): The enzyme cuts with a frequency of (.204)2(.296)2 = .00364 cut sites per base pair. The average fragment size is therefore 1/.00364 ≈ 274 base pairs.

b. BamHI (G^GATCC): The enzyme cuts with a frequency of (.204)4(.296)2 = .000152 cut sites per base pair. The average fragment is 1/.000152 ≈ 6,590 base pairs.

c. Hpa II (C^CGG): The enzyme cuts with a frequency of (.204)4 = .00173 cut sites per base pair. The average fragment is 1/.00173 ≈ 577 base pairs.

d. Sph I (GCATG^C): The enzyme cuts with a frequency of (.204)4(.296)2 = .000152 cut sites per base pair. The average fragment is 1/.000152 ≈ 6,590 base pairs.

e. Nae I (GCC^GGC): The enzyme cuts with a frequency of (.204)6 = .0000721 cut sites per base pair. The average fragment is 1/.0000721 ≈ 13,875 base pairs.

f. Ban I (G^GYRCC): The enzyme cuts with a frequency of (.204)4(.204 + .296)2 = .000433 cut sites per base pair. The average fragment is ≈ 2,310 base pairs.

g. Bst YI (R^GATCY): The enzyme cuts with a frequency of (.204 + .296)2(.204)2 (.296)2

= .000916 cut sites per base pair. The average fragment is 1/.000916 ≈ 1,097 base pairs.

h. Bsl I (CCNNNNN^NNGG): The enzyme cuts with a frequency of (.204)4 = .00173 cut sites per base pair. The average fragment is 1/.00173 ≈ 577 base pairs.

i. Sbf I (CCTGCA^GG): The enzyme cuts with a frequency of (.204)6(.296)2 = .00000631 cut sites per base pair. The average fragment is 1/.00000631 ≈ 158,356 base pairs.

4. Note that the DNA molecule depicted has two recognition sites for the Eco RI enzyme. When you cut a linear piece of DNA twice, you end up with three fragments. These fragments are shown in different colors as follows:

chapter 10

10-6


5′AGATG 3′ 5′ AATTCGCTGAAGAACCAAG 3′ 5′ AATTCGATT 3′ 3′TCTACTTAA 5′ 3′ GCGACTTCTTGGTTCTTAA 5′ 3′ GCTAA 5′

Because the EcoRI restriction site is 6 bp long (G^AATTC), the enzyme would on average

digest DNA of random sequence only once in every 46 or 4096 bp. The original piece of DNA shown in this problem is much smaller than 4096 bp, so it would be unusual to find two recognition sites for EcoRI so close together. However, if the DNA that was digested is long (like the 3 billion bp in the human genome), it is likely that somewhere in the genome two such sites would be as close together as these are.

5. The agarose gel through which the DNA is being electrophoresed has holes that are roughly the size of the DNA fragments to be analyzed. Smaller DNA fragments will find more holes through which they can travel, so they will move faster than larger DNA fragments (their paths will be straighter), whose movements will be retarded when they bump into holes that are not large enough for them to go through (their paths will be more circuitous and thus longer). You should note that the motive force on the DNA fragments exerted by the electric field is the same for all DNA fragments regardless of their sizes, because all nucleotides have about the same mass and same charge. Thus, the ability of a fragment to find holes through which it can travel is the main determinant of the mobility of a DNA fragment in the gel.

6. Gels of different pore sizes can be prepared easily by using different concentrations of agarose. The higher the concentration of agarose, the smaller the average size of the pore, because the agarose will take up a greater part of the volume. Agarose gels are usually between 0.5% and 2.5% agarose, meaning 0.5 g to 2.5 g agarose dissolved in a final volume of 100 ml of buffered solution. A 0.5% gel can resolve DNA fragments between ~1 kb to 3 kb. A 2.5% gel can resolve fragments as small as ~50 bp to a maximum of ~2 kb.

7. Lane A: bacteriophage genome digested with EcoRI. Only a few bands are seen in the gel, so the genome must have been small. There are fewer bands of higher size than is seen in lane C (the other sample in which the bacteriophage genome was digested, but in that case with an enzyme that recognizes a 4-bp sequence rather than the 6-bp sequence for EcoRI).

Lane B: human genomic DNA digested with EcoRI. A smear consisting of many bands is produced, so the genome must have been very large. These fragments are of larger average size than those in lane D (where the human genome was digested with the Hpa II enzyme).

Lane C: bacteriophage genome digested with Hpa II. Only a few bands are seen (so this must be bacteriophage DNA), but these are smaller than those seen in Lane A where the same DNA was digested with EcoRI, which has a larger recognition site.

Lane D: human genomic DNA digested with Hpa II. A smear of many bands is produced, so this must be a digest of human genomic DNA. These are smaller on average than those seen in Lane B, so the recognition site for the enzyme used in Lane D must be smaller than the recognition site for the enzyme used to digest the same DNA in Lane B.

chapter 10

10-7


8. a. Sample A represents the circular form. Compare the EcoRI digests of Sample A and Sample B. Sample A shows 3 bands while Sample B shows 4 bands. If you digest a circular molecule at 3 places, you will end up with 3 bands, while if you digest a linear molecule at the same 3 places, you will end up with 4 bands. The same pattern is seen for all the other digests: Sample A always has 1 band fewer than Sample B when the two samples are cut with the same enzyme(s).

b. You could add up the sizes of the restriction fragments seen in any one digest of Sample B (the linear form of the bacteriophage DNA) to find the total length of this molecule. For example, in the EcoRI digest, this would be 4 + 3 + 2 + 1 = 10 kb. You would find the same value for any of the other digests of Sample B. (Note: The bacteriophage λ genome is actually 48.5 kb.)

c. Using the same addition procedure as in part (b), the total length of the circular form of the bacteriophage DNA is 5 + 3 + 2 = 10 kb. The size of the bacteriophage DNA does not change when it is circularized.

d. The following figure diagrams the locations of all the restriction enzyme recognition sites (E = EcoRI; B = BamHI) and the sizes of the fragments that would be produced by digestion of either the circular (left ) or linearized (right ) form of the chromosome. The red triangles and red diamond indicate the locations of the sticky ends of the bacteriophage λ chromosome.

9. a. With partial restriction digestion, the enzyme does not cut the genomic DNA at every

recognition site that is present in the genome. Thus, the partial digest would produce fewer fragments of longer average sizes than would be produced by digesting the same DNA sample to completion with the same enzyme.

b. The fragments of DNA produced from the genomes of different cells by partial restriction digestion would be different from each other. In contrast with a complete digestion, in which every EcoRI recognition site would be cut in the DNA molecules from all the cells, in a partial digestion you would get a range of different fragments produced from different molecules of genomic DNA.

The figure that follows illustrates that 3 genomic DNA molecules from three different cells would in fact yield different fragments upon a partial digestion. (In this case, conditions were adjusted so that on average only one site out of five was cut; by digesting the DNA for longer or shorter times with the enzyme, researchers can produce fragments with different average sizes.)

chapter 10

10-8


Partial digestion of DNA with restriction enzymes has important practical implications. Because different genomic DNA molecules produce different fragments, scientists can use partial restriction digestion to make genomic libraries with overlapping fragments that allow reconstruction of the genomic DNA sequence.

10. a. No; each of the two ends of a genomic DNA fragment that was broken by shearing was made by a different breakage event.

b. The ends are not sticky, meaning that the ends are not all complementary in sequence and therefore cannot all hybridize (anneal) to each other or to a vector cut with a restriction enzyme. First, many of the fragments have blunt ends. Second, the fragments with overhangs do not all have the same kind of overhang; the overhangs can be any length, any base sequence, and either 5′ or 3′.

c. Only 5′ overhangs can be made blunt by DNA polymerase. Because DNA polymerase synthesizes DNA in the 5′-to-3′ direction, the 5′ overhang can be used as template for extension of the other DNA strand.

Both 5′ and 3′ overhangs can be made blunt with S1 nuclease, because the enzyme will degrade single-stranded DNA without regard to polarity.

Section 10.2

11. a. The purpose of molecular cloning is to isolate DNA corresponding to a small region of a complex genome. With cloning, scientists can purify enough of this small region of DNA to study in detail (for example, to determine the region’s nucleotide sequence).

b. Selectable markers in vectors provide a means of determining which cells in the transformation mix take up the vector. These markers are often drug resistance

chapter 10

10-9


genes, so a drug can be added to the media and only those cells that have received and maintained the vector will grow into colonies.

c. The origin of replication in a plasmid vector allows the vector (and any recombinant DNA molecule that incorporates the vector) to replicate in a bacterial host cell independently of the bacterial chromosome. This property allows the plasmid to be transferred to daughter cells when the cells divide, and allows researchers to purify the plasmid away from the bacterial host′s chromosome. Some plasmid origins of replication are so efficient that a single bacterial cell can contain hundreds of copies of the plasmid, allowing scientists to prepare large amounts of the plasmid DNA.

d. Polylinkers allow vectors to be opened (linearized) with any one (or two) of several different restriction enzymes so that DNA fragments with any one (or two) of a variety of types of ends can be ligated into the vector.

12. To make a recombinant DNA molecule, you would need (in addition to vector and insert DNA) the following enzymes: (c) a restriction enzyme to cut the vector and the genomic DNA so that the resultant pieces have compatible sticky ends; and (d) DNA ligase to stitch together the vector and insert.

13. Yes, this is possible if the enzymes recognize the same sequence but cut that sequence in a different way. Such enzyme pairs are called neoschizomers. For example, both Apa I and Bsp 120I recognize the sequence 5′ GGGCCC 3′, but one leaves a 5′ overhang (Bsp 120I: 5′ G^GGCCC 3′) and the other leaves a 3′ overhang (Apa I: 5′ GGGCC^C 3′). These sticky ends are not compatible because they cannot base pair with each other.

14. First, work through the digestion and ligation of the DNA fragments and the vector. The vector (red ) is cut with BamHI, leaving a linear molecule (the dots connect with one another) and the following ends :

5′ …G 3′ 5′ GATCC… 3′ 3′ …CCTAG 5′ 3′ G… 5′

The insert DNA (blue) is cut with Mbo I, leaving the following sticky ends. The N means any base; of course, the bases in the two strands are complementary.

5′ GATCN──────N 3′ 5′ GATCN───────N 3′ 3′ N──────NCTAG 5′ 3′ N───────NCTAG 5′

The ligation of an Mbo I fragment to a BamHI sticky end will only occasionally create a sequence that can be digested by BamHI. It depends on the exact base sequence at the ends of the MboI fragment. The N in the sequence below indicates this ambiguity. In all cases the following sequence will be found:

5′ …GGATCN───────NGATCC… 3′ 3′ …CCTAGN───────NCTAGG… 5′

a. 100% of the junctions can be digested with Mbo I because they all contain the recognition sequence 5′ GATC 3′.

chapter 10

10-10


b. A junction that can be digested with BamHI must have a C at the 3′ end of the Mbo I recognition sequence; in other words, the N following the 5′ GATC at the junction would need to be C. This would occur 1/4 or 25% of the time.

c. None of the junctions will be cleavable by Xor II because none of the possible junctions has the sequence 5′ CGATCG 3′.

d. Reading on one strand, a junction that can be digested Bst YI must have a purine preceding the 5′ GATC and a pyrimidine following the 5′ GATC. In the previous diagram, the nucleotide preceding 5′ GATC is always a purine (it is a G originally derived from the vector), while the nucleotide following the 5′ GATC will be a pyrimidine 50% of the time in random DNA. Therefore, 50% of the junctions can be cleaved by BstYI.

e. In the human genome, an Mbo I site can be written 5′ …N1GATCN2… 3′ on one strand. After cleavage with Mbo I, the product would be 5′ GATCN2… 3′. If this is now inserted into the BamHI–cut vector and the resulting site can be cut with BamHI, then N2 must have been a C. Thus, the original site in the human genome was 5′ …N1GATCC… 3′. For the restriction site to be a BamHI site in the human genome, N1 must be a G. In random DNA, the chance that N1 was NOT a G = 3/4.

15. After digestion, the vector (red) will look as follows. (It is now a linear molecule; the dots at the two ends are connected to each other.)

5′ …CGGATCCCCTAAGATG 3′ 5′ AATTCCGCGCGCATCGGC…3′ 3′ …GCCTAGGGGATTCTACTTAA 5′ 3′ GGCGCGCGTAGCCG…5′

The insert fragment, which is the blue fragment from Problem 4 that has EcoRI sticky ends at both sides, is the following:

5′ AATTCGCTGAAGAACCAAG 3′ 3′ GCGACTTCTTGGTTCTTAA 5′

a. The two possible recombinant molecules are:

5′…CGGATCCCCTAAGATGAATTCGCTGAAGAACCAAGAATTCCGCGCGCATCGGC… 3′ 3′…GCCTAGGGGATTCTACTTAAGCGACTTCTTGGTTCTTAAGGCGCGCGTAGCCG… 5′ or 5′…CGGATCCCCTAAGATGAATTCTTGGTTCTTCAGCGAATTCCGCGCGCATCGGC… 3′ 3′…GCCTAGGGGATTCTACTTAAGAACCAAGAAGTCGCTTAAGGCGCGCGTAGCCG… 5′

b. The fragment could be inserted into the vector in either of two orientations that still preserve the 5′-to-3′ polarity of the resultant recombinant molecules. Relative to each other, the two orientations are both flipped (in terms of the strands) and turned 180°.

c. The recombinant DNA molecule made (regardless of the orientation of the insert) will have only a single site for BamHI. This comes from the vector and is underlined in the answers to part (a). The blue insert has no recognition sites for this enzyme.

chapter 10

10-11


d. After the recombinant DNA is digested with BamHI, only one fragment will exist (the circular recombinant molecule will be linearized). The size of this fragment would be the length of the pMBG36 vector (4271 bp) plus the size of the insert (19 bp), or 4290 bp.

e. The recombinant DNA molecule would have two sites for EcoRI; one at the left junction of the vector and insert, and the other at the right junction of the vector and insert. This is an important point: After ligation, the insert will be flanked by EcoRI sites at either side because both the vector and insert are contributing sequences that will make up these sites in the recombinant DNA.

f. After the recombinant DNA is digested with EcoRI, two fragments will exist; one is the vector and the other is the insert. One fragment will thus be 4271 bp and the other will be 19 bp. (Here, we count the single-stranded sticky ends as if they were one double-strand.)

16. a. The two oligonucleotides can anneal (hybridize) to each other to form adapters as shown in red in the diagram that follows. The adapters have one BamHI–compatible sticky end and one EcoRI–compatible sticky end. When mixed with the insert molecules and ligase, the adapters will hybridize with and be ligated to the EcoRI ends of the inserts. Now the inserts can be ligated into the BamHI–cut vector via the BamHI–compatible ends of the adapters. In effect, you are adding extra nucleotides onto both ends of the insert such that each end now has BamHI compatible sticky ends.

b. The vector originally had an EcoRI site within the drug resistance gene (the selectable marker), and as shown in the preceding diagram the insert + adapter provides two more. Thus, the recombinant DNA has 3 EcoRI sites in total. The vector had one BamHI site originally, and ligation of the insert generated an additional one: the recombinant DNA has 2 BamHI sites.

c. The adapters would need to have one BamHI–compatible sticky end and one blunt end. For example, the two annealed oligonucleotides shown in red in the following figure would work:

chapter 10

10-12


Note that the sequence outside of the BamHI sticky end is not important, provided the base pairs are complementary (in the answer shown, all are G⋮C base pairs). Scientists often design adapter oligonucleotides with high G⋮C base pair content as G⋮C base pairs are stronger than A:T base pairs.

17. a. You need 4–5 genome equivalents to reach a 95% confidence level that you will find a specific unique DNA sequence.

b. The number of clones needed depends on the total size of the genome of your research organism and the average insert size in the vector. Plasmid vectors normally have inserts smaller than 15 kb, while inserts into bacterial artificial chromosome vectors (BACs) can be 300 kb and those into yeast artificial chromosomes can have inserts as large as 2000 kb (= 3 Mb). Divide the number of base pairs in the genome by the average insert size then multiply by five to get the number of clones in five genome equivalents.

18. a. The theoretical minimum number of recombinant BACs to cover the entire genome is the genome size divided by the average size of the insert in each BAC: 3,000,000,000 base pairs per genome/100,000 base pairs per BAC = 30,000 BACs/genome. Note that to cover the entire genome, each of the 30,000 BACs would need to contain a completely different (nonoverlapping) segment of the genome that would end at the base pair before the next begins. It is virtually impossible that you could ever construct such a library using randomly fragmented genomic DNA.

b. f = 100,000 bp per BAC / 3,000,000,000 bp per genome = .000034 genomes per BAC. (Alternatively stated, each BAC insert contains ~1/30,000 of the genome.)

c. For a 99% chance that a specific 100 kb region is represented: N = [ln (1−.99)] / [ln (1−.000034)] = 135,444 BACs. For a 99.9% chance that a specific 100 kb region is represented: N = [ln (1−.999)] / [ln (1−.000034)] = 203,170 BACs.

d. You calculated in part (a) that 30,000 BACs is 1 genomic equivalent. Therefore, 135,444 BACs are 135,444/30,000 ≈ 4.5 genomic equivalents; 203,170 BACs are 203,170/30,000 ≈ 6.8 genomic equivalents.

e. We can answer this question by using the equation provided and by performing algebra to solve for P. In this case, N = 30,000.

30,000 = [ln (1 – P )] / [ln (1–0.000034)] 30,000 × ln (1–0.000034) = ln (1 – P ) –1.02 = ln (1 – P ) e−1.02 = 1 – P 0.36 = 1 – P P = 1 - 0.36 = 0.64 Therefore, the chance is 64% that any specific 100,000 bp region of the genome is

represented in a recombinant BAC. Note that this is the same library you constructed in part (a).

chapter 10

10-13


f. One of the major lessons of this problem is that because ln(0) is undefined, you can never make a library in which the chance that any segment is represented is 100%. Instead, operationally you would always aim to construct a library with some predetermined probability P that any specific region is represented. If you are solving for N, and if P is already defined in this way, the only other variable is f , the fraction of the genome in a single recombinant clone.

For a given genome, f can be manipulated only by altering the size of the insert: The larger the average insert, the fewer clones would be needed to make the library. BACs and YACs are prized as vectors for constructing genome libraries because they can accommodate large inserts.

19. One of the issues encountered in molecular cloning (the construction of recombinant DNA molecules) is that after the vector is cut by a restriction enzyme, the two ends of the vector can come back together and be resealed by DNA ligase without any insert being incorporated at all. Such resealed vectors are undesirable because bacterial cells transformed with these DNAs would form colonies that could not be distinguished (except by time-consuming DNA sequencing) from colonies containing the recombinant DNA you were trying to make. One way to deal with this issue is to use vectors like that shown in the figure.

If the vector resealed without any insert (these are sometimes called empty vectors), the resultant colonies growing on ampicillin plates supplemented with X-gal would be blue in color, because these cells would make the β-galactosidase enzyme, and this enzyme would turn the colorless X-gal into a blue compound. If the vector contained an insert, the resultant colonies growing on the same plate would be white, because no enzyme and thus no blue compound would be made. This property of the vector allows researchers to identify white colonies that have recombinant DNA molecules and to ignore blue colonies that have resealed vectors without inserts.

20. a. The goal of the ligation is to generate recombinant DNA molecules that have attached one piece of frog DNA to one vector molecule. A ligation mixture consists of linear double-stranded vector DNA with complementary EcoRI sticky ends (Fig. 10.4) at both ends; and linear double-stranded frog DNA, also with complementary EcoRI sticky ends at both ends.

DNA ligase simply connects a 3′–OH (hydroxyl) group to a 5′–P (phosphate). The most abundant products that occur in a ligation mixture are those produced by intramolecular ligation of the 5′ and 3′ ends of a vector molecule or an insert molecule to form a circle. The reason intramolecular ligation occurs most frequently is that these 5′ and 3′ ends are on the same DNA molecule, so they will bump up against each other the most often in the ligation solution. Of these intramolecular ligation products, circularized fragments of frog DNA cannot be recovered in transformants because these DNAs have no origin of replication and no selectable marker. However, as was discussed in Problem 19, recircularized (reconstituted) vector molecules without frog DNA inserts (empty vectors) will be recovered frequently in transformants.

The desired ligation product is produced by intermolecular ligation of a single vector molecule with a single frog insert molecule. The desired molecules will be

chapter 10

10-14


made much less frequently than reconstituted vectors because the 5′ and 3′ ends to be joined are on different DNA molecules. (Other intermolecular ligations that produce chains with multiple vectors and/or inserts can also occur.)

To decrease the amount of reconstituted vector produced (that is, vector without insert), you treat the linear, digested vector with alkaline phosphatase. The following figures show why you would perform this procedure, using the pMBG36 vector (after cleavage) previously shown in Problem 15 as an example. The dots at the ends mean that the strands of the digested vector are contiguous except for where the cleavage took place. The figure also illustrates in black that after cleavage by EcoRI, the 5′ carbon of the nucleotide at the 5′ end of the sticky overhang is attached to a phosphate group, while the 3′ carbon of the nucleotide at the 3′ end has only a hydroxyl group and no phosphate. The vector is in red and the insert in blue.

5′ …CGGATCCCCTAAGATGOH 3′ 5′ pAATTCCGCGCGCATCGGC…3′ 3′ …GCCTAGGGGATTCTACTTAAp 5′ 3′ HOGGCGCGCGTAGCCG…5′

Alkaline phosphatase removes the 5′ phosphate groups on the vector DNA molecule:

5′ …CGGATCCCCTAAGATGOH 3′ 5′ HOAATTCCGCGCGCATCGGC…3′ 3′ …GCCTAGGGGATTCTACTTAAOH 5′ 3′ HOGGCGCGCGTAGCCG…5′

After the treatment with alkaline phosphatase, ligase cannot join a hydroxyl group to the dephosphorylated 5′ ends. Therefore, the two ends of the vector cannot be ligated to each other and this treated molecule will remain linear. If insert DNA is added then the ligase will join the 3′ OHs on the vector with the 5′ Ps on the insert:

5′…CGGATCCCCTAAGATGOH pAATTCGCTGAAGAACCAAGOH HOAATTCCGCGCGCATCGGC…3′ 3′…GCCTAGGGGATTCTACTTAAOH HOGCGACTTCTTGGTTCTTAAp HOGGCGCGCGTAGCCG…5′

You should note that after this ligation, each strand will still have one nick (a position without a phosphodiester bond) where an –OH from the vector abuts and –OH from the insert. But after the ligation mix is transformed into Escherichia coli cells, enzymes in the bacterial cells repair these nicks in the phosphate backbone.

As discussed in Problem 19, certain vectors are constructed so that they contain the lacZ gene with a restriction site right in the middle of the gene. If the vector reanneals to itself without inclusion of an insert, the lacZ gene will remain uninterrupted; if an insert has been cloned into the vector the lacZ gene will be interrupted. The ligation mix is transformed into E. coli cells such that about one cell out of 1000 cells takes up a plasmid. The transformed cells are plated on media containing ampicillin. Only the cells with a plasmid will grow, thus removing the intramolecular ligation products that consist of inserts. The media also contains X-gal. This is a substrate for the β-galactosidase protein that is coded for by the lacZ gene. The β-galactosidase enzyme cleaves X-gal and produces a molecule that turns the cell blue. Those cells that took up an intact, recircularized vector with no insert will produce β-galactosidase and form blue colonies. The bacterial cells that took up a vector + insert (that is, recombinant DNA) will not be able to produce functional β-galactosidase and will form white colonies.

The non-dephosphorylated vector (that is, with no prior alkaline phosphatase treatment) reanneals to itself at a high frequency, leading to 99/100 blue colonies

chapter 10

10-15


after transformation with the ligation mix. The dephosphorylated vector (that had been treated with alkaline phosphatase) formed 99/100 white colonies after transformation of the ligation mix; almost all the vectors had an insert of frog genomic DNA.

b. Yes, the suggestion was a good one. Dephosphorylation of the vector increased the fraction of colonies with recombinant DNA molecules (vector + insert) 100-fold.

c. The choice of whether to dephosphorylate the vector versus the insert DNA is based on an understanding of the mechanics of the bacterial transformation that is carried out after the ligation. If the vector is dephosphorylated, it cannot self-ligate. The insert can self-ligate. As mentioned in part (a), the self-ligated inserts do not have any vector DNA, so they do not have a bacterial origin of replication (ORI) nor do they have a gene encoding antibiotic resistance. Therefore, these recircularized DNAs will not allow the transformed bacteria to grow on the selective media. If the insert were dephosphorylated, it will not self-ligate, but the vector WILL self-ligate. The vector has the antibiotic resistance gene and ORI, so the empty vector will be propagated in E. coli , generating a high level of background—colonies that survive drug selection but are transformed only with empty vector.

Section 10.3

21. Assuming the DNA molecule to be sequenced is already in hand, the only enzyme you would need to sequence DNA is DNA polymerase (a).

22. a. Reading the peaks from left-to-right (corresponding to the 5′-to-3′ direction because DNA polymerase successively adds nucleotides to the 3′ end), the sequence would be 5′ TTTGCTTTGTGAGCGGATAACAA 3′. [Note that this is NOT the sequence of the primer, but instead the sequence of part of the human DNA insert.]

b. The sequence written in part (a) is of the strand that is being newly synthesized in the sequencing reaction.

c. You know how to design the primer because: (1) You added the insert into the vector at a known location defined by a specific restriction enzyme; and (2) you know the sequence of the vector that flanks the position of the insert.

chapter 10

10-16


d. The smallest molecule synthesized in the sequencing reaction that contains dideoxy G would be:

5′ GCCTCGAATCGGGTACCTTTG* 3′ The asterisk indicates the dideoxy G, which must be at the 3′ end because the chain terminates when this dideoxy nucleotide has been incorporated. The primer is underlined. (This answer assumes that the smallest red peak in the sequence analysis represents the smallest possible DNA that could be made in the sequencing reaction.)

e. If dATP were left out of the reaction, the only A nucleotides present would be ddATPs. No synthesis products could proceed past the first ddA incorporated (the first T encountered in the template). The resultant data would look like this:

If ddATP were omitted instead, the “A” peaks would be absent because none of the synthesis products would end with ddA. Because the paper on which the peaks are recorded runs at a constant speed through the detector, blank spaces would appear where the A peaks would have been.

f. As long as the peaks are clear and distinguishable, the different peak heights are irrelevant to the data analysis. The peak height represents the efficiency at which DNA polymerase adds a particular dideoxy nucleotide relative to a normal deoxy nucleotide at a particular position in the sequencing reaction. Small differences in these relative efficiencies do not alter the identity of the nucleotide at that position.

Section 10.4

23. The key point to genomic sequencing is that the fragments to be sequenced need to overlap; otherwise you would have no way of understanding their relationship to each other in the original genome. This problem explores several methods to achieve such overlap.

a. Mechanical shearing will fragment DNA at random locations, ensuring that overlap will be obtained if enough fragments are analyzed. Because the genomic DNA sample is obtained from many cells, the fragments from different cells will not have the same beginning and end points; thus, they will overlap.

b. In this case, overlap is obtained by making different libraries of the same genome, each library constructed with a different restriction enzyme (for example, one

chapter 10

10-17


library made with the EcoRI enzyme and a second library with the BamHI enzyme). The recognition sites for different enzymes are at different positions in the genome to be analyzed, so that the fragments in any region obtained from the libraries will overlap.

One problem with constructing a library by complete digestion of a genome with a given restriction enzyme is that by chance, some fragments may be too large to be cloned or analyzed, while others are so small. Thus, another advantage of using multiple libraries made with different restriction enzymes is to minimize the chance that a region will not be covered. For example, a too-long fragment with EcoRI sites at its ends may have internal sites for BamHI or some other enzyme.

c. You could perform a partial digestion with the single restriction enzyme. As shown in the figure accompanying the answer to Problem 9 above, partial digestion will ensure overlap because different genomic DNA molecules will be cut at different sets of recognition sites.

If the restriction enzyme recognizes sites that are relatively far apart in the genome (for example, a 6 bp recognition sequence will be found on average every 4 kb, but an 8 bp sequence will be found on average every 64 kb), any approach using a single enzyme will suffer because some fragments are too large to clone. It will be preferable to use an enzyme that recognizes a 4 bp sequence because the sites will be spaced closer together. The more potential locations of digestion, the more the fragments would resemble a library made by mechanical shearing with cuts made at random locations. If you adjust the partial digestion conditions properly, you could construct a library that would have inserts with an average size of any desired length.

24. The primers must be designed based on DNA sequences you already know. In the case of the recombinant DNA construct, you already know (from the figure accompanying Problem 15) the sequence of the pMBG36 plasmid vector surrounding the site into which foreign DNA will be inserted. These primers must “point into” the insert, in the sense that at each vector/insert junction, the 3′ end of the primer should be closer to the insert than the 5′ end of the primer. In this way, the new nucleotides added during the sequencing reaction will be complementary to nucleotides in the insert. (After all, the insert is ultimately what you want to sequence.) One primer would be:

5′ CGGATCCCCTAAGATGAATTC 3′

The other primer would be:

3′ CTTAAGGCGCGCGTAGCCG 5′ = 5′ GCCGATGCGCGCGGAATTC 3′

Note that to be as long as possible, both primers will include the EcoRI recognition site (underlined) at their 3′ ends, because you know that all the inserts must have this site at both junctions with the vector.

25. Whole-genome shotgun sequencing of small fragments is now a workable strategy for sequencing an individual human’s genome because we already have the human reference sequence; paired-end reads of larger inserts are no longer necessary. Using BLAST, each short sequence read can be matched to a position on the human reference sequence. The reference sequence indicates the positions of all of the transposable

chapter 10

10-18


element repeats, as well as the unique sequences upstream and downstream of them. Thus, the problem of correctly assembling unique sequences that flank transposable elements no longer exists.

26. The solution to this problem depends on understanding two key points. First, the two end-reads from a recombinant DNA clone are derived from opposite strands of the inserted genomic fragment. Second, the size of this fragment is roughly known because of the way the library was constructed. a. The data are consistent with the two maps shown in the following diagram. In the

diagram, Locus 1 and Locus 2 are labeled arbitrarily, and the orientation of each map is also arbitrary. Not all elements are drawn to scale; for easy visualization, the arrows representing the end-reads and the rectangles representing the repeated sequences are disproportionally large.

The A and B reads from each clone must have 5′-to-3′ orientations pointing toward each other. These reads are about 2 kb apart for clones 1−4, and about 4 kb apart for clones 5 and 6. Note that clones 1 and 2 must come from different loci because they do not have the same number of dinucleotide repeats. By keeping track of the assumptions and the number of repeating units, clearly clones 1 and 4 must overlap, and that clones 2 and 3 must overlap.

b. One of the loci (Locus 2 in the diagram) has 12 copies of the repeating unit. The data do not define the number of repeating units of CT at Locus 1, although this number must be at least 17.

c. The data are incompatible with the hypothesis that only a single locus is represented. Note that the DNA sequences directly flanking the (CT)n repeats are completely different at the two loci. If the two repeats were alleles that differed only in the number of repeating units, the flanking sequences should be the same.

In Chapter 12 you will see that regions of DNA with repeats of simple sequences like (CT)n provide useful markers for specific locations in the genome. The reason is that at locus (genomic location), different chromosomes may have different numbers of repeating units that constitute different alleles of the locus. Several techniques allow scientists to track such alleles with ease.

Documents

Chapter 10 Final ADA v2 - ezto-cf-media.mheducation.com