67
Lecture 2. Genome Lecture 2. Genome sequencing sequencing What good is it? 9/2/09

Lecture 2. Genome sequencing What good is it? 9/2/09

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Lecture 2. Genome Lecture 2. Genome sequencingsequencing

What good is it?9/2/09

The hopeThe hope

If we just had an organism’s genome sequence we would understand what makes it tick?

- and how it can see, smell, hear, touch and taste!

… … and dreamand dream

… and we could fix it when it can’t see, hear, smell, touch, or taste

By looking at the genome, can By looking at the genome, can we know what an organism we know what an organism

looks like?looks like?

Kasahara et al 2007

By looking at the genome, can By looking at the genome, can we know what an organism we know what an organism

looks like?looks like?

Can we go from genotype to phenotype?

Questions for todayQuestions for today

1. What is a genome?2. How big is a genome?3. How do you sequence

a genome?4. How much sequence

is needed?5. How much does it

cost?6. What genomes have

been sequenced?

Q#1. What is a genome?Q#1. What is a genome?

Q#1. What is a genome?Q#1. What is a genome?

Genome = genes + chromosomes1920 Hans Winkler

Genome is “Full complement of DNA molecules possessed by an organisms”More than ‘all the genes’ *

* Brown “Genetics a molecular approach”

What else is in a genome What else is in a genome besides genes?besides genes?

Regulatory DNAPromotersEnhancers

SpacersIntrons, intergenic regions, telomers

JunkPseudogenesRepetative DNA

Can an organism have more Can an organism have more than one genome?than one genome?

Nuclear Organelles

MitochondrialChloroplast

Genome is made of nucleotidesGenome is made of nucleotides

5’

3’

1’

All information is contained in 4 basesAll information is contained in 4 bases

A

G

C

T

Two strands of DNA held together by Two strands of DNA held together by hydrogen bonds between the baseshydrogen bonds between the bases

PurinesAdenineGuanine

PyrimadinesCytosineThymine

BondsA=TC=G

Strands are antiparallelStrands are antiparallel

5’

3’

5’

3’

Sequence is reported for Sequence is reported for only one strandonly one strand

5’

3’

Genomic DNA = nuclear DNA Genomic DNA = nuclear DNA arranged in chromosomesarranged in chromosomes

Autosomes come in pairs

Sex chromosomes can have one or two copies

Fluorescent in situ hybridization to paint human chromosomes

Which sex should you Which sex should you sequence?sequence?

Autosomes come in pairs

Sex chromosomes can have one or two copies

Fluorescent in situ hybridization to paint human chromosomes

Q#2. How big is a genome? Q#2. How big is a genome?

Vertebrate genome sizes (genome sequence)Vertebrate genome sizes (genome sequence)

Species Genome size (109 bp)

Chromosome #(2n)

Human 3 46

Mouse 2.7 40

Chicken 1.1 78

Puffer fish 0.4 44

Zebrafish 1.4 48

Lamprey 2.4 168

Non-vertebrate genome sizesNon-vertebrate genome sizesSpecies Genome size

(108 bp)Chromosome # (2n)

Mosquito (yellow fever)

14 6

Mosquito (malaria)

2.8 6

Drosophila 1.3 8

Bee 1.9 16?

Nematode worm

1 12

Yeast 0.12 32

Q#3: How do we sequence a Q#3: How do we sequence a genome?genome?

Human Genome Project (HGP) method- Make a genome map- Create a library of clones- Sequence clones from golden path- Use genome map to check assembly

and deal with gaps Clone by clone method

A. Make a genome mapA. Make a genome map

Genetic map shows how markers are arranged on each chromosome

Marker 1 A BMarker 1 A B

Marker 2 C DMarker 2 C D

Marker 3 G HMarker 3 G H

Look at marker association in a family. The closer two markers are, the more likely they will be inherited together

Genotype markers for each Genotype markers for each individual in a familyindividual in a family

Marker 2-3 2/10 recombinantsMarker 1-2 1/10 recombinants

* * *

*

Parents

ACGG

BDHH

ACGG

Offspring

BDHH

ACGG

ACGG

ACGG

ACGG

ACGG

ACGG

ACHH

ACHH

ADHH

BDHH

BDHH

BDHH

BDHH

BDHH

BDHH

BDGG

BDGG

BCGG

Genetic maps have many Genetic maps have many usesuses

Marker 1Marker 1Marker 2Marker 2

Marker 3Marker 3

Road map to genome

Can use to find loci which cause disease or a particular traits

B. Make a BAC clone libraryB. Make a BAC clone library

Cut genome into 100-200 kb pieces

Clone each piece into vectorTransform bacteriaActs like separate chromosome = bacterial artificial chromosome (BAC)

Vector

Plate bacteriaPlate bacteria

Pick colonies into platesPick colonies into plates

Stack of plates becomes the library

Use a robotUse a robot

How many plates in a libraryHow many plates in a library

Human genome = 3 x 109 bpIf 100,000 bp / clone = 3 x 104 clonesIf 384 wells / plate = 78 plates

This is 1x coverage If want 10x coverage

That is 780 platesThis takes up 2 shelves of a -80 freezer

B. Library covers genome B. Library covers genome multiple timesmultiple times

Many clones to cover one location

Clones overlap with each otherMake a physical map of their

relationships

200 kb

The golden pathThe golden path

Pick clones which just cover genome once

Minimum tiling pathSequence just those clones

200 kb

C. Sequence each BAC cloneC. Sequence each BAC clone

200 kb

Cut each BAC clone into 2 kb fragments

Subclone it into a different vectorPlate and pickSequence the ends of each sub clone

3 kb

Sanger sequencing reactionSanger sequencing reaction

Sanger sequencing reactionSanger sequencing reaction

Electrophoresis - separate Electrophoresis - separate fragments by sizefragments by size

ElectrophoresisElectrophoresis

Sequence chromatogramSequence chromatogramFrom a single sequence read you can get 600-700 bp

Get sequence reads from Get sequence reads from each end of subcloneeach end of subclone

3 kb subclone

600-700 kb end reads

Get sequence of ends of each subclone

200 kb

3 kb

Assemble end reads to get Assemble end reads to get BAC clone sequenceBAC clone sequence

BAC clone

Assemble the golden path Assemble the golden path sequencesequence

Produces sequence along each chromosome - can still be gaps

200 kb

D. Check BAC clones against D. Check BAC clones against genetic mapgenetic map

200 kb

Genetic map

Markers in each clone should fall in right location on the genetic map

Sequence

E. Finish the genomeE. Finish the genome

Use PCR and targeted cloning to fill in all the gaps and get every base

That is a huge effort!

Current status of human genome

Craig Venter’s wayCraig Venter’s way

Whole genome shotgunSkip the BAC by BAC approach

Make 3 different libraries3 kb most of clones10 kb few50-200 kb very few

Sequence ends of all clones

Sequences will produce Sequences will produce contigscontigs

Huge jigsaw puzzle Will get multiple reads which overlap

and so can be put togetherContigs are pieces of contiguous DNAMostly come from 3kb libraries

Will be gaps where sequence is missing

Use “paired end reads” to figure out how these contigs go togetherLarge libraries provide long range order

Whole genome shotgunWhole genome shotgun

3kb library 10 kb library 100 kb library

Cheaper process but can end up with more holes

Sequencing Craig Venter’s Genome October 2007

1st personal genome

Q#4. How much sequence is Q#4. How much sequence is needed to cover the genome?needed to cover the genome? Probability that a given base is sequenced

x times follows Poisson distribution

where P is probability, x is number of times a base is

sequenced is genomic coverage (i.e. 4x or 10x)€

P(x,λ ) =e−λλx

x!

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 5 10 15 20 25

# times base sequenced

fracton of bases

Poisson distribution - 1x Poisson distribution - 1x coveragecoverage

# times % of 1xSeq genome

0 37%1 37%2 18%3 6%4 1.55 0.03

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 5 10 15 20 25

# times base sequenced

fracton of bases

Poisson distribution - 2x Poisson distribution - 2x coveragecoverage

# times % of 1x % ofSeq genome 2x

0 37% 13.5%1 37 272 18 273 6 184 1.5 95 0.03 4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 5 10 15 20 25

Times a given base is sequenced

Probability

124710

Fraction of genome not covered1x 36%2x 14%4x 2%7x 0.1%

Sequence qualitySequence quality

Want each base to be sequenced several times so have good confidence in the sequence

6-8x is rule of thumb Human genome was sequenced to

8-10x coverage

How long does it take - Sanger How long does it take - Sanger sequencingsequencing

Human genome = 3 x 109 bp To do 10x coverage = 3 x 1010 bp # reads = 3 x 1010 bp / 500 bp

= 6 x 107 reads Machine takes 2 hrs for 96 reads

6 x 107reads / 96 reads in 2 hrs = 1.2x106 hrs Genome center has 300 machines and

runs 24/71.2 x 106 hrs / 300 machines / 24 hrs = 174

days = 6 months

Q#5. What does it cost?Q#5. What does it cost?

Sequencing costs for typical Sequencing costs for typical vertebratevertebrate

Average genome size (bp) 1 x 109

Typical sequence coverage (bp)

5x = 5 x 109

# reads needed (500 bp) 1 x107

Cost of 1 read $1

Cost of project $10M

New sequencing methodsNew sequencing methods

Human Genome Project relied on capillary electrophoresis using Sanger sequencingSequence 96 pieces at a time (1 plate)

New methods get rid of electrophoresisSequence 100,000s pieces at time

Massively parallel sequencing by synthesis Companies

454 Life SciencesIlluminaOthers

Sequencing Jim Sequencing Jim Watson’s genome April Watson’s genome April

20082008

Sequencing Craig Venter’s Genome October 2007

HGP vs Venter vs WatsonHGP vs Venter vs WatsonHGP Venter Watson

Method Clone by clone

Whole genome shotgun

454 Life Sciences

Coverage 8-10x 7.5x 7.4x

Time 13 yrs 4 yrs 4.5 month

Cost $2.7B $100M $1.5M

Authors 2800 31 27

How soon till the $1000 How soon till the $1000 genome???genome???

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.00E+08

1.00E+09

1.00E+10

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Year

Cost for a genome

Q#6. What genomes have Q#6. What genomes have been sequencedbeen sequenced

NIH genome pagehttp://www.ncbi.nlm.nih.gov/Genomes/

Sanger genome linkshttp://www.ensembl.org/index.html

VertebratesVertebratesVertebrate genomes sequenced

Divergence times in MY separating different species

QuickTime™ and a decompressor

are needed to see this picture.

All verts in sequencing pipeline

2007

Assignment for WednesdayAssignment for Wednesday

1. Why sequence a genome?

2. Think about a genome that you would like to see sequenced.

Use the NIH genome site: http://www.genome.gov/10002154

See class website

No class for Labor Day - I am away for the weekend.