54
BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg 2113E McGaugh Hall - office hours Wed 12-1 PM (or by appointment) phone 824-8573 [email protected] TA – Curtis Daly cdaly@uci. edu 2113 McGaugh Hall, 924-6873, 3116 Office hours Tuesday 11-12 lectures will be posted on web pages after lecture http://eee.uci.edu/04s/05705/ - link only here http://blumberg-serv.bio.uci.edu/bio145b-sp2004 http://blumberg.bio.uci.edu/bio145b-sp2004

BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

Embed Size (px)

Citation preview

Page 1: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 1 ©copyright Bruce Blumberg 2004. All rights reserved

BioSci 145B Lecture #4 4/27/2004

• Bruce Blumberg– 2113E McGaugh Hall - office hours Wed 12-1 PM (or by

appointment)– phone 824-8573– [email protected]

• TA – Curtis Daly [email protected]– 2113 McGaugh Hall, 924-6873, 3116– Office hours Tuesday 11-12

• lectures will be posted on web pages after lecture – http://eee.uci.edu/04s/05705/ - link only here– http://blumberg-serv.bio.uci.edu/bio145b-sp2004– http://blumberg.bio.uci.edu/bio145b-sp2004

Page 2: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 2 ©copyright Bruce Blumberg 2004. All rights reserved

Vectors for cDNA cloning

• Plasmids vs phage– phage preferred for high density manual screening– plasmids are better for functional screening

• microinjection• transfection• panning

– phage packaging and infection more efficient than electroporation

• 10-100x better than best transformation frequency

• what will the library be used for ?– Consider the intended use as well as other contemplated uses

• will the library go to an EST project?– Plasmid

• will it be screened manually– phage

• or arrayed and screened on high density filters– plasmid

• will we normalize it?– Probably plasmid

Page 3: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 3 ©copyright Bruce Blumberg 2004. All rights reserved

Vectors for cDNA cloning (contd)

• Analysis of cDNAs obtained– rate limiting step in clone analysis is getting them into a usable

form• usually a plasmid

– cloning is tedious, particularly if one has many positives• some tricks can be used but this is still the bottleneck

• in about 1985 or so, Stratagene introduced lambda ZAP– phage with an embedded plasmid and M13 packaging signals– plasmid can be automatically excised by adding a helper phage

• gene II protein replicates plasmid into single stranded phagemid which is secreted

– major advance, most phage libraries today are made in ZAP– early protocols had helper phage problems - solved

• later, others developed a Cre-lox based system– instead of M13 used loxP sites.– When Cre recombinase is added, recombination between the loxP

sites excises a plasmid• both work very well and make analysis of many clones very

straightforward

Page 4: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 4 ©copyright Bruce Blumberg 2004. All rights reserved

Vectors for cDNA cloning (contd)

Page 5: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 5 ©copyright Bruce Blumberg 2004. All rights reserved

Vectors for cDNA cloning (contd)

Page 6: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 6 ©copyright Bruce Blumberg 2004. All rights reserved

mRNA frequency and cloning

• mRNA frequency classes – classic references

• Bishop et al., 1974 Nature 250, 199-204• Davidson and Britten, 1979 Science 204, 1052-1059

– abundant • 10-15 mRNAs that together represent 10-20% of the total RNA

mass• > 0.2%

– intermediate • 1,000-2,000 mRNAs together comprising 40-45% of the total• 0.05-0.2% abundance

– rare • 15,000-20,000 mRNAs comprising 40-45% of the total• abundance of each is less than 0.05% of the total• some of these might only occur at a few copies per cell

• How does one go about identifying genes that might only occur at a few copies per cell?

Page 7: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 7 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction

• How to identify genes that might only occur at a few copies per cell?– alter the representation of the cDNAs in a library or probe

– Normalization - process of reducing the frequency of abundant and increasing the frequency of rare mRNAs

• Bonaldo et al., 1996 Genome Research 6, 791-806

– Subtraction - removing cDNAs (mRNAs) expressed in two populations leaving only differentially expressed

• Sagerström et al. (1997) Ann Rev. Biochem 66, 751-783

Page 8: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 8 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction

• Normalization - reducing abundant, increase rare mRNAs -– normalization should bring

cDNA abundunce to within 10x• rarely works this well • Typically, abundant genes

reduced 10x, rare ones increased 3-10x

• Intermediate class genes do not change much at all

– Approach• make a population of cDNAs

single stranded - tester• hybridize with a large excess

of cDNA or mRNA to Cot½ =5.5

– driver

• Cot½ value is critical for success of normalization

– 5-10 optimal, higher values NOT better

Page 9: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 9 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction (contd)

– Approach (contd)• various approaches to make driver

– use mRNA - may not be easy to get– make ssRNA by transcribing library– ssDNA from gene II/ExoIII treating inserts from plasmid

library– PCR amplification of library

• best approach is to use driver derived from the same library by PCR

– rapid, simple and effective– other approaches each have various technical difficulties– see the Bonaldo review for details.

Page 10: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 10 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction (contd)

– What are normalized libraries good for?• EST sequencing• gene identification

– biggest use is to reduce the number of cDNAs that must be screened

– good general purpose target to screen» subtracted libraries are useful but limited in utility

– Drawbacks• Not trivial to make• Size distribution of library changes

– Longer cDNAs lost

Page 11: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 11 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction (contd)

• Subtraction - removing cDNAs (mRNAs) expressed in two populations leaving only differentially expressed– Sagerström et al. (1997) Ann Rev. Biochem 66, 751-783

• +/- screening St. John and Davis (1979) Cell 16, 443-452. – Hybridize the same library with probes prepared from two

different sources and compare the results• example - hybridize normal liver cDNA library with probes

from normal and cancerous liver

– Colonies or plaques that are expressed in target tissue (tumor) compared with control are picked

– Why aren’t all colonies labeled in normal tissue?

Page 12: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 12 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction (contd)

• +/- screening (contd)– Advantages

• Relatively simple approach• Doesn’t require difficult manipulations on probes

– Disadvantages• Housekeeping genes often appear to be differential• Sensitivity less than subtracted screening

– +/- screening typically requires >10 fold difference in expression levels using standard methods

• not widely used any longer BUT• microarray analysis is really just a refined version of +/-

screening

Page 13: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 13 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction (contd)

• Subtractive screening - Sargent and Dawid (1983) Science 222, 135-139. – Make 1st strand cDNA from a tissue and then hybridize it to

excess mRNA from another

• larger Cot½ is best >20 at least – WHY?

– remove double stranded materials -> common seqs– make a probe or library from the remaining single stranded cDNA

Page 14: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 14 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction (contd)

• Subtractive screening (contd)– benefits

• sensitive• can simultaneously identify all cDNAs that are differentially

present in a population• good choice for identifying unknown, tissue specific genes

– drawbacks• easy to have abundant housekeeping genes slip through

– multistage subtraction is best– in effect normalize first, then subtract

• libraries have limited applications– may not be useful for multiple purposes

Page 15: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 15 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction (contd)

– rule of thumb• make a high quality representative library from a tissue of

interest• save subtraction and other fancy manipulations for making

probes to screen such libraries with– unlimited screening– easy to use libraries for different purposes, e.g. the liver

library» hepatocarcinoma» cirrhosis» regeneration specific genes

Page 16: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 16 ©copyright Bruce Blumberg 2004. All rights reserved

Nobel Prize in Chemistry 1980Walter Gilbert (Harvard) & Frederick Sanger (MRC Labs)(Sanger also won Nobel in 1958 for protein sequencing)

DNA sequence analysis

• DNA sequencing = determining the nucleotide sequence of DNA– Two main methods– shared Nobel prize in 1980

• Chemical cleavage – Maxam and Gilbert

• Enzymatic sequencing (based on polymerization reaction)

Page 17: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 17 ©copyright Bruce Blumberg 2004. All rights reserved

DNA sequence analysis

• Maxam and Gilbert– One of the first reasonable sequencing methods– Very popular in late 70s and early 80s– VERY TEDIOUS!!

• Totally superceded by dideoxy sequencing now

Page 18: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 18 ©copyright Bruce Blumberg 2004. All rights reserved

DNA sequence analysis (contd)

• Dideoxy sequencing – Sanger 1977– Virtually all sequencing is

done this way now– Requires modified

nucleotide• 2’3’-dideoxy dNTP

– DNA polymerase incorporates the ddNTP and chain elongation terminates

– Original method used 4 separate elongation reactions

– Products separated by denaturing PAGE and visualized by autoradiography

Page 19: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 19 ©copyright Bruce Blumberg 2004. All rights reserved

DNA sequence analysis (contd)

• Dideoxy sequencing (contd) – Sanger 1977– Dideoxy NTPs present at ~1% of [dNTP]– Each reaction has identified end– In principle, all possible chain lengths are represented

• varies by [dNTPs], [ddNTPs], [primer] and [template] and ratios

Page 20: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 20 ©copyright Bruce Blumberg 2004. All rights reserved

DNA sequence analysis (contd)

A C G T A C G T

A C G T

Page 21: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 21 ©copyright Bruce Blumberg 2004. All rights reserved

1. Trace files (dye signals) are analyzed and bases called to create chromatograms.

2. Chromatograms from opposite strands are reconciled with software to create double-stranded sequence data.

Automated DNA sequence analysis

• How to improve throughput of sequencing?– Incorporate fluorescent ddNTPs, separate products by PAGE

• Base calling and lane calling issues– Key advance was capillary sequencers

• Separate DNA in a thin capillary instead of gel• Very accurate, no tracking errors, much more automation

friendly

Page 22: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 22 ©copyright Bruce Blumberg 2004. All rights reserved

Automated DNA sequence analysis

• Capillaries vs gels– Capillaries much faster – higher field strength possible– Fully automated = higher throughput

Page 23: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 23 ©copyright Bruce Blumberg 2004. All rights reserved

Applied Biosystems PRISM 377(Gel, 34-96 lanes)

Applied Biosystems PRISM 3700(Capillary, 96 capillaries)

Page 24: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 24 ©copyright Bruce Blumberg 2004. All rights reserved

PCR – polymerase chain reaction amplification of DNA

• PCR is most routinely used method to amplify DNA– Exponential amplification of DNA by

polymerases – Saiki et al, 1985• 2n fold amplification, n= # cycles

– 35 cycles = 235 = 3.4 x 1010 fold

• Originally used DNA polymerase I– Needed to add fresh enzyme

at every cycle because heat denaturation of template killed the enzyme

– Not widely used – too painful to do manually

– Nobel Prize to Kary Mullis in 1993 for deciding to use Taq DNA polymerase for PCR

• He was middle author on paper!

Page 25: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 25 ©copyright Bruce Blumberg 2004. All rights reserved

Hot water bacteria: Thermus aquaticusTaq DNA polymerase

Life at High Temperatures by Thomas D. BrockBiotechnology in Yellowstone© 1994 Yellowstone Association for Natural Sciencehttp://www.bact.wisc.edu/Bact303/b27

PCR – polymerase chain reaction amplification of DNA (contd)

Page 26: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 26 ©copyright Bruce Blumberg 2004. All rights reserved

Cycle sequencing – fusion of PCR and fluorescent ddNTP sequencing

• http://www.biology.uoc.gr/under/courses/cellfunc/cycseq.htm• Combine PCR amplification with

dideoxy sequencing – cycle sequencing– Linear amplification of template

in the presence of fluorescent ddNTPs– When nucleotides are used up

reaction is over– Separate on capillary electrophoresis

instrument– Advantages

• Fast, single tube reaction• Works with small amounts of

starting material– Disadvantages

• Still need to prepare highquality template to sequence

• Cost and time– Many sequencing centers spend

time, $$ on template prep– Automation requirements

Page 27: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 27 ©copyright Bruce Blumberg 2004. All rights reserved

Isothermal amplification – the solution to template preparation

• How to make template preparation faster, easier and more reliable?– Eliminate automation requirement, amplify starting material in

some other way– Φ29 DNA polymerase (aka TempliPhi)– http://www1.amershambiosciences.com/aptrix/upp01077.nsf/Cont

ent/autodna_templiphi_intro– Enzyme has high processivity and strand displacement activity

• Isothermal reaction produces huge quantities of DNA from tiny amount of input

• More efficient than PCR (no temp change, no machine, no cleanup)

Page 28: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 28 ©copyright Bruce Blumberg 2004. All rights reserved

Modern DNA sequence analysis

• Cycle sequencing– Virtually all DNA sequencing today is done by cycle sequencing

with fluorescent ddNTPs• ABI Big Dye chemistry

– Template preparation still tedious for small scale• TempliPHi used in genome centers (obviated need for most

automation)– Capillary sequencers predominant form of technology in use

Page 29: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 29 ©copyright Bruce Blumberg 2004. All rights reserved

DNA sequence analysis

• Landmarks in DNA sequencing– Sanger, Nicklen and Coulson. Sequencing with chain terminating

inhibitors. Proc. Natl. Acad. Sci. 74, 5463-5467 (1977).– Sanger, F. et al. The nucleotide sequence of bacteriophage

ΦX174. J Mol Biol 125, 225-46. (1978).– Sutcliffe, J. G. Complete nucleotide sequence of the Escherichia

coli plasmid pBR322. Cold Spring Harb Symp Quant Biol 43, 77-90. (1979).

– Sanger et al., Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol 162, 729-73. (1982).

– Messing, J., Crea, R. & Seeburg, P. H. A system for shotgun DNA sequencing. Nucl.Acids Res 9, 309-21 (1981).

– Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457-65 (1981).

– Deininger, P. L. Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. Anal Biochem 129, 216-23. (1983).

– Baer et al. DNA sequence and expression of the B95-8 Epstein-Barr virus genome. Nature 310, 207-11. (1984). (189 kb)

– Innis et al. DNA sequencing with Taq DNA polymerase and direct sequencing of PCR-amplified DNA Proc. Natl. Acad. Sci. 85, 9436-9440 (1988)

Page 30: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 30 ©copyright Bruce Blumberg 2004. All rights reserved

DNA sequence analysis (contd)

• Landmarks in DNA sequencing (contd).– 1995 - Haemophilus influenzae (1.83 Mb)

– 1995 - Mycoplasma genitalium (0.58 Mb)

– 1996 - Saccharomyces cerevisiae genome (13 Mb)– 1996 - Methanococcus jannaschii (1.66 Mb)

– 1997 - Escherichia coli (4.6 Mb)– 1997 - Bacillus subtilis (4.2 Mb)– 1997 - Borrelia burgdorferi (1.44 Mb)

– 1997 - Archaeoglobus fulgidus (2.18 Mb)

– 1997 - Helicobacter pylori (1.66 Mb)

• first bacterium sequenced, human pathogen

• smallest free living organism

• first Archaebacterium

• Lyme disease

• first sulfur metabolizing bacterium

• first bacterium to cause cancer

Page 31: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 31 ©copyright Bruce Blumberg 2004. All rights reserved

• Landmarks in DNA sequencing (contd)– 1998 - Treponema pallidum (1.14 Mb)– 1998 - Caenorhabditis elegans genome (97 Mb)– 1999 - Deinococcus radiodurans (3.28 Mb)

– 2000 - Drosophila melanogaster (120 Mb)– 2000 - Arabidopsis thaliana (115 Mb)– 2001 - Escherichia coli O157:H7 (4.1 Mb)– 2001 - Human “genome”– 2002 – mouse genome– 2002 – Ciona intestinalis

– 2004 – rat genome

• resistant to radiation, starvation, ox stress

DNA sequence analysis (contd)

• Primitive chordate

Page 32: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 32 ©copyright Bruce Blumberg 2004. All rights reserved

DNA Sequence analysis• Complete DNA sequence (all nts both strands, no gaps)

– complete sequence is desirable but takes time• how long depends on size and strategy employed

– which strategy to use depends on various factors• how large is the clone?

– cDNA– genomic

• How fast is sequence required?

• sequencing strategies– primer walking– cloning and sequencing of restriction fragments– progressive deletions

• Bidirectional, unidirectional– Shotgun sequencing

• whole genome• with mapping

– map first (C. elegans)– map as you go (many)

Page 33: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 33 ©copyright Bruce Blumberg 2004. All rights reserved

DNA Sequence analysis (contd)

• Primer walking - walk from the ends with oligonucleotides– sequence, back up ~50 nt from end, make a primer and continue

• Why back up?

Page 34: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 34 ©copyright Bruce Blumberg 2004. All rights reserved

DNA Sequence analysis (contd)

• Primer walking (contd)– advantages

• very simple• no possibility to lose bits of DNA

– restriction mapping– deletion methods

• no restriction map needed• best choice for short DNA

– disadvantages• slowest method

– about a week between sequencing runs• oligos are not free (and not reusable)• not feasible for large sequences

– applications• cDNA sequencing when time is not critical• targeted sequencing

– verification– closing gaps in sequences

Page 35: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 35 ©copyright Bruce Blumberg 2004. All rights reserved

DNA Sequence analysis (contd)

• Cloning and sequencing of restriction fragments– once the most popular method

• make a restriction map, subclone fragments• sequence

– advantages• straightforward• directed approach• can go quickly• cloned fragments often useful otherwise

– RNase protection, nuclease mapping, in situ hybridization– disadvantages

• possible to lose small fragments– must run high quality analytical gels

• depends on quality of restriction map– mistaken mapping -> wrong sequence

• restriction site availability– applications

• sequencing small cDNAs• isolating regions to close gaps

Page 36: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 36 ©copyright Bruce Blumberg 2004. All rights reserved

DNA Sequence analysis (contd)

• nested deletion strategies - sequential deletions from one end of the clone– cut, close and sequence

• Approach– make restriction map– use enzymes that cut in polylinker and insert– Religate, sequence from end with restriction site– repeat until finished, filling in gaps with oligos

• advantages– Fast, simple, efficient

• disadvantages– limited by restriction site availability in vector and insert– need to make a restriction map

Page 37: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 37 ©copyright Bruce Blumberg 2004. All rights reserved

• nested deletion strategies (contd)– Exonuclease III-mediated deletion

• cut with polylinker enzyme– protect ends -

» 3’ overhang» phosphorothioate

• cut with enzyme between first cut and the insert

– can’t leave 3’ overhang• timed digestions with Exonuclease III• stop reactions, blunt ends• ligate and size select recombinants• sequence• advantages

– unidirectional– processivity of enzyme

gives nested deletions

DNA Sequence analysis (contd)

Page 38: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 38 ©copyright Bruce Blumberg 2004. All rights reserved

DNA Sequence analysis (contd)

• Nested deletion strategies– Exonuclease III-mediated deletion (contd)

• disadvantages– need two unique restriction sites flanking insert on each

side– best used successively to get > 10kb total deletions– may not get complete overlaps of sequences

» fill in with restriction fragments or oligos• applications

– method of choice for moderate size sequencing projects» cDNAs» genomic clones

– good for closing larger gaps

• Small-scale sequence analysis – how is it practiced today?– Primer walking– ExoIII-mediated deletion with primer walking

Page 39: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 39 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing

• The problem– Genome sizes for most eukaryotes are large (108-109 bp)– High quality sequences only about 600-800 bp /pass

• The solution– Break genome into lots of bits and sequence them all– Reassemble with computer

• The benefit– Rapid increase in information about genome size, gene

comparisons, etc

Page 40: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 40 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing (contd)• Shotgun sequencing NOT invented by Craig Venter

– Messing 1981 first description of shotgun– Sanger lab developed current methods in 1983– approach

• blast genome into small chunks• clone these chunks

– 3-5 kb, 8 kb plasmid– 40 kb fosmid jump

repetitive sequences• sequence + assemble by computer

– A priori difficulties• how to get nice uniform distribution• how to assemble fragments• what to do about repeats?• How to minimize sequence redundancy?

Page 41: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 41 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing(contd)

Page 42: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 42 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing(contd)

Page 43: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 43 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing (contd)

• Shotgun sequencing (contd)– How to minimize sequence redundancy?

• Best way to minimize redundancy is map before you start– C. elegans was done this way - when the sequence was

finished, it was FINISHED» mapping took almost 10 years

– mapping much too tedious and nonprofitable for Celera» who cares about redundancy, let’s sequence and

make $$• why does redundancy matter?

– Finished sequence today costs about $0.50/base

Page 44: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 44 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing (contd)

– Mapping by fingerprinting

– Mapping by hybridization

Page 45: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 45 ©copyright Bruce Blumberg 2004. All rights reserved

Traditional (map first) vs STC (map as you go along) mapping

Page 46: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 46 ©copyright Bruce Blumberg 2004. All rights reserved

The human genome

• In Feb 12 2001, Celera and Human Genome project published “draft” human genome sequencs– Celera -> 39114– Ensembl -> 29691– Consensus from all sources ~30K

• Number of genes– C. elegans – 19,000– Arabidopsis 25,000

• Predictions had been from 50-140k human genes– What’s up with that?– Are we only slightly more complicated than a weed?– How can we possibly get a human with less than 2x the number

of genes as C. elegans– Implications?

• UNRAVELING THE DNA MYTH: The spurious foundation of genetic engineering, Barry Commoner, Harpers Magazine Feb, 2002

Page 47: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 47 ©copyright Bruce Blumberg 2004. All rights reserved

The human genome

• The answer – Sloppy science– Gene sets don’t overlap completely– Floor is 42K – 105,680 UniGene clusters from ESTs (down from 128,826 last year)

= 42113

Page 48: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 48 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing(contd)

– Whole genome shotgun sequencing (Celera)• premise is that rapid generation of draft sequence is valuable• why bother trying to clone and sequence difficult regions?

– Basically just forget regions of repetitive DNA - not cost effective

• using this approach, genome is alleged to be 90% finished– rule of thumb is that it takes at least as long to finish the

last 5% as it took to get the first 95%• problems

– sequence may never be complete as is C. elegans– much redundant sequence with many sparse regions and

lots of gaps.– Fragment assembly for regions of highly repetitive DNA is

dubious at best– “Finished” fly and human genomes lack more than a few

already characterized genes

Page 49: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 49 ©copyright Bruce Blumberg 2004. All rights reserved

The human genome

• How finished is the human genome sequence?– Draft sequence to high coverage– Chromosome by chromosome finishing now

• Chr 22 – 1999• Chr 21 – 2000• Chr 20 – 2001• Chr 15 – 2003• Chr 6,7,Y-2003• Chr 13,19 -2004

Page 50: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 50 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing (contd)

• Knowing what we know now – how to approach a large new genome?– Xenopus tropicalis 1.7 Gb (about ½ human)– BAC end sequencing– Whole genome shotgun– Gaps closed with BACS– 8 x coverage by end of 2004– Finishing dependent on additional funding

Page 51: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 51 ©copyright Bruce Blumberg 2004. All rights reserved

Genome sequencing

• DOE – Joint Genome Institute– http://www.jgi.doe.gov/– Numerous advances in sequencing technology

• Increased pass rate from ~70% to > 90%• Lowered cost nearly 3 fold

Page 52: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 52 ©copyright Bruce Blumberg 2004. All rights reserved

Useful software for molecular biology (contd)

• NCBI – www.ncbi.nlm.nih.gov– main information and analysis resource– indispensable resource

Page 53: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 53 ©copyright Bruce Blumberg 2004. All rights reserved

Useful software for molecular biology (contd)

• NCBI – Blast – how to find similar genes• www.ncbi.nlm.nih.gov/BLAST/

Page 54: BioSci 145B lecture 4 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #4 4/27/2004 Bruce Blumberg –2113E McGaugh Hall -

BioSci 145B lecture 4 page 54 ©copyright Bruce Blumberg 2004. All rights reserved

Useful software for molecular biology (contd)

• Why pay Celera?