View
216
Download
2
Tags:
Embed Size (px)
Citation preview
The hopeThe hope
If we just had an organism’s genome sequence we would understand what makes it tick?
- and how it can see, smell, hear, touch and taste!
By looking at the genome, can By looking at the genome, can we know what an organism we know what an organism
looks like?looks like?
Kasahara et al 2007
By looking at the genome, can By looking at the genome, can we know what an organism we know what an organism
looks like?looks like?
Can we go from genotype to phenotype?
Questions for todayQuestions for today
1. What is a genome?2. How big is a genome?3. How do you sequence
a genome?4. How much sequence
is needed?5. How much does it
cost?6. What genomes have
been sequenced?
Q#1. What is a genome?Q#1. What is a genome?
Genome = genes + chromosomes1920 Hans Winkler
Genome is “Full complement of DNA molecules possessed by an organisms”More than ‘all the genes’ *
* Brown “Genetics a molecular approach”
What else is in a genome What else is in a genome besides genes?besides genes?
Regulatory DNAPromotersEnhancers
SpacersIntrons, intergenic regions, telomers
JunkPseudogenesRepetative DNA
Can an organism have more Can an organism have more than one genome?than one genome?
Nuclear Organelles
MitochondrialChloroplast
Two strands of DNA held together by Two strands of DNA held together by hydrogen bonds between the baseshydrogen bonds between the bases
PurinesAdenineGuanine
PyrimadinesCytosineThymine
BondsA=TC=G
Genomic DNA = nuclear DNA Genomic DNA = nuclear DNA arranged in chromosomesarranged in chromosomes
Autosomes come in pairs
Sex chromosomes can have one or two copies
Fluorescent in situ hybridization to paint human chromosomes
Which sex should you Which sex should you sequence?sequence?
Autosomes come in pairs
Sex chromosomes can have one or two copies
Fluorescent in situ hybridization to paint human chromosomes
Vertebrate genome sizes (genome sequence)Vertebrate genome sizes (genome sequence)
Species Genome size (109 bp)
Chromosome #(2n)
Human 3 46
Mouse 2.7 40
Chicken 1.1 78
Puffer fish 0.4 44
Zebrafish 1.4 48
Lamprey 2.4 168
Non-vertebrate genome sizesNon-vertebrate genome sizesSpecies Genome size
(108 bp)Chromosome # (2n)
Mosquito (yellow fever)
14 6
Mosquito (malaria)
2.8 6
Drosophila 1.3 8
Bee 1.9 16?
Nematode worm
1 12
Yeast 0.12 32
Q#3: How do we sequence a Q#3: How do we sequence a genome?genome?
Human Genome Project (HGP) method- Make a genome map- Create a library of clones- Sequence clones from golden path- Use genome map to check assembly
and deal with gaps Clone by clone method
A. Make a genome mapA. Make a genome map
Genetic map shows how markers are arranged on each chromosome
Marker 1 A BMarker 1 A B
Marker 2 C DMarker 2 C D
Marker 3 G HMarker 3 G H
Look at marker association in a family. The closer two markers are, the more likely they will be inherited together
Genotype markers for each Genotype markers for each individual in a familyindividual in a family
Marker 2-3 2/10 recombinantsMarker 1-2 1/10 recombinants
* * *
*
Parents
ACGG
BDHH
ACGG
Offspring
BDHH
ACGG
ACGG
ACGG
ACGG
ACGG
ACGG
ACHH
ACHH
ADHH
BDHH
BDHH
BDHH
BDHH
BDHH
BDHH
BDGG
BDGG
BCGG
Genetic maps have many Genetic maps have many usesuses
Marker 1Marker 1Marker 2Marker 2
Marker 3Marker 3
Road map to genome
Can use to find loci which cause disease or a particular traits
B. Make a BAC clone libraryB. Make a BAC clone library
Cut genome into 100-200 kb pieces
Clone each piece into vectorTransform bacteriaActs like separate chromosome = bacterial artificial chromosome (BAC)
Vector
How many plates in a libraryHow many plates in a library
Human genome = 3 x 109 bpIf 100,000 bp / clone = 3 x 104 clonesIf 384 wells / plate = 78 plates
This is 1x coverage If want 10x coverage
That is 780 platesThis takes up 2 shelves of a -80 freezer
B. Library covers genome B. Library covers genome multiple timesmultiple times
Many clones to cover one location
Clones overlap with each otherMake a physical map of their
relationships
200 kb
The golden pathThe golden path
Pick clones which just cover genome once
Minimum tiling pathSequence just those clones
200 kb
C. Sequence each BAC cloneC. Sequence each BAC clone
200 kb
Cut each BAC clone into 2 kb fragments
Subclone it into a different vectorPlate and pickSequence the ends of each sub clone
3 kb
Get sequence reads from Get sequence reads from each end of subcloneeach end of subclone
3 kb subclone
600-700 kb end reads
Assemble the golden path Assemble the golden path sequencesequence
Produces sequence along each chromosome - can still be gaps
200 kb
D. Check BAC clones against D. Check BAC clones against genetic mapgenetic map
200 kb
Genetic map
Markers in each clone should fall in right location on the genetic map
Sequence
E. Finish the genomeE. Finish the genome
Use PCR and targeted cloning to fill in all the gaps and get every base
That is a huge effort!
Craig Venter’s wayCraig Venter’s way
Whole genome shotgunSkip the BAC by BAC approach
Make 3 different libraries3 kb most of clones10 kb few50-200 kb very few
Sequence ends of all clones
Sequences will produce Sequences will produce contigscontigs
Huge jigsaw puzzle Will get multiple reads which overlap
and so can be put togetherContigs are pieces of contiguous DNAMostly come from 3kb libraries
Will be gaps where sequence is missing
Use “paired end reads” to figure out how these contigs go togetherLarge libraries provide long range order
Whole genome shotgunWhole genome shotgun
3kb library 10 kb library 100 kb library
Cheaper process but can end up with more holes
Q#4. How much sequence is Q#4. How much sequence is needed to cover the genome?needed to cover the genome? Probability that a given base is sequenced
x times follows Poisson distribution
where P is probability, x is number of times a base is
sequenced is genomic coverage (i.e. 4x or 10x)€
P(x,λ ) =e−λλx
x!
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 5 10 15 20 25
# times base sequenced
fracton of bases
Poisson distribution - 1x Poisson distribution - 1x coveragecoverage
# times % of 1xSeq genome
0 37%1 37%2 18%3 6%4 1.55 0.03
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 5 10 15 20 25
# times base sequenced
fracton of bases
Poisson distribution - 2x Poisson distribution - 2x coveragecoverage
# times % of 1x % ofSeq genome 2x
0 37% 13.5%1 37 272 18 273 6 184 1.5 95 0.03 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 5 10 15 20 25
Times a given base is sequenced
Probability
124710
Fraction of genome not covered1x 36%2x 14%4x 2%7x 0.1%
Sequence qualitySequence quality
Want each base to be sequenced several times so have good confidence in the sequence
6-8x is rule of thumb Human genome was sequenced to
8-10x coverage
How long does it take - Sanger How long does it take - Sanger sequencingsequencing
Human genome = 3 x 109 bp To do 10x coverage = 3 x 1010 bp # reads = 3 x 1010 bp / 500 bp
= 6 x 107 reads Machine takes 2 hrs for 96 reads
6 x 107reads / 96 reads in 2 hrs = 1.2x106 hrs Genome center has 300 machines and
runs 24/71.2 x 106 hrs / 300 machines / 24 hrs = 174
days = 6 months
Sequencing costs for typical Sequencing costs for typical vertebratevertebrate
Average genome size (bp) 1 x 109
Typical sequence coverage (bp)
5x = 5 x 109
# reads needed (500 bp) 1 x107
Cost of 1 read $1
Cost of project $10M
New sequencing methodsNew sequencing methods
Human Genome Project relied on capillary electrophoresis using Sanger sequencingSequence 96 pieces at a time (1 plate)
New methods get rid of electrophoresisSequence 100,000s pieces at time
Massively parallel sequencing by synthesis Companies
454 Life SciencesIlluminaOthers
HGP vs Venter vs WatsonHGP vs Venter vs WatsonHGP Venter Watson
Method Clone by clone
Whole genome shotgun
454 Life Sciences
Coverage 8-10x 7.5x 7.4x
Time 13 yrs 4 yrs 4.5 month
Cost $2.7B $100M $1.5M
Authors 2800 31 27
How soon till the $1000 How soon till the $1000 genome???genome???
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
1.00E+09
1.00E+10
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Year
Cost for a genome
Q#6. What genomes have Q#6. What genomes have been sequencedbeen sequenced
NIH genome pagehttp://www.ncbi.nlm.nih.gov/Genomes/
Sanger genome linkshttp://www.ensembl.org/index.html
VertebratesVertebratesVertebrate genomes sequenced
Divergence times in MY separating different species
Assignment for WednesdayAssignment for Wednesday
1. Why sequence a genome?
2. Think about a genome that you would like to see sequenced.
Use the NIH genome site: http://www.genome.gov/10002154
See class website
No class for Labor Day - I am away for the weekend.