Upload
juliana-avila
View
247
Download
0
Embed Size (px)
Citation preview
7/30/2019 Lecture Slides Lecture 2 Slides
1/54
TheHumanGenomeProject
HaroldRiethman,PhD
7/30/2019 Lecture Slides Lecture 2 Slides
2/54
TheHumanGenomeProject
HumanGenomeMapping
SangerDNASequencing
PublicConsortium
Draft
Sequence
CeleraDraftSequence
Analysis
of
Draft
Human
Genome
Sequence
7/30/2019 Lecture Slides Lecture 2 Slides
3/54
Landmarks FragmentEnds
DNAprobes Restriction
Enzyme
STS Meiosis
Radiation
Cloneends
HumanGenomeMapping
7/30/2019 Lecture Slides Lecture 2 Slides
4/54
MappingResolution
From
Matise
et
al.,
Genome
Analysis
Vol
4,
1999
7/30/2019 Lecture Slides Lecture 2 Slides
5/54
MeioticRecombinationCreatesthe
ChromosomeBreakpoints
for
Genetic
Linkage
Maps
7/30/2019 Lecture Slides Lecture 2 Slides
6/54
FromChakravarti and
Lynn,
Genome
Analysis
Vol 4,
1999
7/30/2019 Lecture Slides Lecture 2 Slides
7/54
Analysisof
Linkage
Data
Comparetheobservedfrequencythat2markersarecotransmittedtoprogenywiththeexpectedfrequencyofcotransmission
if
by
random
chance
Formarkersthatareclosetogetheronachromosomethe
ratioof
observed/expected
is
high,
said
to
be
linked
markers
Linkagemapdistancesarestatisticalestimatesofmarker
distancebased
on
recombination
frequencies
Forhumansthetotalnumberofbreaksandhencetheresolutionofhumangeneticlinkagemapsislimited(1cM,roughly
=1Mb)
because
the
number
of
meiosis
is
limited.
7/30/2019 Lecture Slides Lecture 2 Slides
8/54
RadiationCreates
the
Chromosome
Breaks
>radiation, >frequencyofbreaks
tightlylinkedmarkerswilltendtoberetainedonthe
sameRHfragments
RadiationHybrid
Maps
7/30/2019 Lecture Slides Lecture 2 Slides
9/54
Like linkage maps, RH maps are based upon probabilities
Statistical errorinmarkerorderanddistance
RetentionBiasnearCentromeres
MappingpanelsofDNAavailable
RadiationHybrid
Maps
7/30/2019 Lecture Slides Lecture 2 Slides
10/54
CloneLibraryPreparation YACs&BACs
CloneOverlapDetectionandContig
Construction STSandFingerprintMethods
Contig PlacementAlong
Chromosome
AlignmentwithSTSmaps&FISH
Clonebased
Physical
Maps
7/30/2019 Lecture Slides Lecture 2 Slides
11/54
A) Yeast Artificial Chromosome (YAC) 200 kb 2 Mb
B) BacterialArtificialChromosome(BAC) 150kb 250kb
C)
Cosmid/Fosmid 40kb
CloningSystems
7/30/2019 Lecture Slides Lecture 2 Slides
12/54
Cloneends
Clonebased
PhysicalMap
FromDunham
et
al.,
GenomeAnalysisVol
3,1999
7/30/2019 Lecture Slides Lecture 2 Slides
13/54
UsingthesamesetofSTSmarkerspermits
integrationofthedifferentmaps
YACbasedCloneMapswereusefulfor
preparing100kbresolutionSTSmaps,but
BACbasedclonemapsbecamethe
workhorsefor
sequencing
7/30/2019 Lecture Slides Lecture 2 Slides
14/54
HumanSequence
ready
Physical
Map
7/30/2019 Lecture Slides Lecture 2 Slides
15/54
BACclonebased
physicalmap
FromWaterston
et
al.,
PNAS
2002
7/30/2019 Lecture Slides Lecture 2 Slides
16/54
BACCloning
150 250kb
FromBirren etal.,
Genome
Analysis
Vol 3,
1999
7/30/2019 Lecture Slides Lecture 2 Slides
17/54
7/30/2019 Lecture Slides Lecture 2 Slides
18/54
BAC
fingerprint
mapping
FromMarra
et
al.
Genome
Research
1997
7/30/2019 Lecture Slides Lecture 2 Slides
19/54
FromMarra et
al.
Genome
Research
1997
7/30/2019 Lecture Slides Lecture 2 Slides
20/54
7/30/2019 Lecture Slides Lecture 2 Slides
21/54
SangerDNA
Sequencing
7/30/2019 Lecture Slides Lecture 2 Slides
22/54
FromShendureandJi,Nat.
Biotech2008
SangerDNAsequencing
7/30/2019 Lecture Slides Lecture 2 Slides
23/54
BACclonebased
physicalmap
FromWaterston
et
al.,
PNAS
2002
SangerDNA
Sequencing
Assembly
Anchoringto
Chromosome
7/30/2019 Lecture Slides Lecture 2 Slides
24/54
Preparationof
Templates
and
Sequencing
Reactions
highlyautomated
1.)Growthofclones
coloniespickedrobotically,transferredtosmallculturesin96wellformats
2.)PurificationofDNA
reagentsaddedandremovedrobotically,procedurescarriedoutin96wellformat
3)DNAsequencingreactions
reagentsadded
and
removed
robotically,
procedures
carried
out
in
96
well
or
96
x4(384
well)
format
7/30/2019 Lecture Slides Lecture 2 Slides
25/54
ProductionSequencingatMITsWhiteheadInstitute
7/30/2019 Lecture Slides Lecture 2 Slides
26/54
FromGordon
et
al.,
Genome
Research
1998.
SequenceAssembly
7/30/2019 Lecture Slides Lecture 2 Slides
27/54
Sequencing
BACAteachstep,sequencesdepositedintoGenBank withanAccessionnumber
a) 96reactions (Phase0)
samplesequences,
overlap
detection
b)35xcoverage (Phase1)
assembleddraft
c) 8 10xcoverage (Phase2)
Highqualitydraftassemblies
d) finishedsequence (Phase3)
qvalues
>40,
no
gaps
in
sequence
7/30/2019 Lecture Slides Lecture 2 Slides
28/54
Green,1997 WeberandMeyers,1997
FromWaterston
et
al.,
PNAS
2002
7/30/2019 Lecture Slides Lecture 2 Slides
29/54
InternationalHumanGenomeSequencingConsortium
DraftHuman
Genome
Sequence:
Nature
Feb.15,
2001
Goal:Immediateand
unrestrictedpublicaccessto
genomesequence
7/30/2019 Lecture Slides Lecture 2 Slides
30/54
BACclonebased
physicalmap
FromWaterston
et
al.,
PNAS
2002
SangerDNA
Sequencing
Assembly
Anchoringto
Chromosome
PublicDraftSequenceStrategy
7/30/2019 Lecture Slides Lecture 2 Slides
31/54
7/30/2019 Lecture Slides Lecture 2 Slides
32/54
AssemblingtheDraftSequence
Filtering
Nonhumansequences
ContaminationwithotherBACsequences
Layout
Sequenceslayed overphysicalmap
Lab
mix
upsElectronicdigesttoplaceincorrectcontig
BACendsequences
STSandFISHmaps
Merging
GigAssembler
Sequencecontig scaffold
7/30/2019 Lecture Slides Lecture 2 Slides
33/54
FromIHGSC
Nature
2001
7/30/2019 Lecture Slides Lecture 2 Slides
34/54
N50Length
Maximumlength
Lsuch
that
50%
of
all
nucleotides
lieincontigs (orscaffolds)oflengthL
7/30/2019 Lecture Slides Lecture 2 Slides
35/54
FromIHGSCNature2001
7/30/2019 Lecture Slides Lecture 2 Slides
36/54
CeleraGenomicsDraftHumanGenomeSequence
ScienceFeb.16,
2001
Businessmodel:
patent
genesforcommercial
products,sellaccesstothe
genome
sequence
and
Celeragenerated
annotations
7/30/2019 Lecture Slides Lecture 2 Slides
37/54
7/30/2019 Lecture Slides Lecture 2 Slides
38/54
Freepublicdatawasusedandincorporatedby
Celerainto
its
assembly
Summaryof
Input
Sequence:
1.) 15GbofCelerarawsequence5xcoverage
2.) 4.4Gb
of
Public
Draft
Sequence
(derivedfromabout23Gb(7.5xcoverage)
ofrawsequence)
a.)
shredded
to
a
perfect
2x
coverage
of
Bactigsb.)2.96xcoverageofgenome
7/30/2019 Lecture Slides Lecture 2 Slides
39/54
FromVenter
et
al.,
Science
2001
Celeraassemblystrategy
7/30/2019 Lecture Slides Lecture 2 Slides
40/54
FromVenteret
al.,Science
2001
7/30/2019 Lecture Slides Lecture 2 Slides
41/54
WGA:WholeGenomeAssembly
5xcelera reads+3xpublicfauxreads
+celera mate
pair
data
TrueWGA(?)
7/30/2019 Lecture Slides Lecture 2 Slides
42/54
BACtigs
are
Mapped
BAC
sequence
contigs from
the
publicproject
CSA:CompartmentalizedShotgunAssembly
(1) Bactigs fromregion+celera readsmatchingBactigs
(2) Celerauniquescaffoldsmappingtoregion
(3)scaffoldtilingforcompartmentcheckedmanually
(4)Publicsequencewithincompartmentshredded,thenreassembled
withcelera reads
from
compartment
7/30/2019 Lecture Slides Lecture 2 Slides
43/54
FromVenteret
al.,Science
2001
Comparison Public Draft View
7/30/2019 Lecture Slides Lecture 2 Slides
44/54
Comparison,PublicDraftView
7/30/2019 Lecture Slides Lecture 2 Slides
45/54
Comparison,CeleraView
7/30/2019 Lecture Slides Lecture 2 Slides
46/54
Analysis
of
Draft
Genome
Sequences
7/30/2019 Lecture Slides Lecture 2 Slides
47/54
Public
Data
NationalCenterforBiotechnologyInformationwww.ncbi.nlm.nih.gov/genome/guide/human/
UCSCGenomeBrowser extensiveannotationgenome.ucsc.edu/
Footnote:Initially,
Celera
data
was
available
only
through
licensingagreements,butlateritwasdepositedintothepublic
databases
7/30/2019 Lecture Slides Lecture 2 Slides
48/54
Human
Genome
Size:
~3billion
base
pairs
No.ofgenes:~40,000
Average
gene
density:
12
genes/
Mb Generichchromosome:17,19,22
Genepoorchromosome:4,13,18,X,Y
7/30/2019 Lecture Slides Lecture 2 Slides
49/54
CpG
islands RegionswithhigherfrequencyofCpGdinucleotides.
>
200
bp
regions
with
>
50%
GC. Associatedwith5endsofgenes/neartranscriptionalstart
sites.
30,000 50,000inthehumangenome.
Y:2.9islands/Mb; 19:43islands/Mb
GoodcorrelationofgenedensityandCpGislands
7/30/2019 Lecture Slides Lecture 2 Slides
50/54
Recombination
Rate
Onaverage: higherinfemalesthanmales.
Highlyvariableamongdifferentgenomic
regions.
Higher telomericregions.
Lower aroundthecentromeres.
Repeat Content
7/30/2019 Lecture Slides Lecture 2 Slides
51/54
RepeatContent
About50%ofthegenome
5classesofrepetitiveelements: transposonderivedrepeats(interspersedrepeats) 45%
genome;LINEs,SINEs,LTRs,DNAtransposons
inactiveretroposed copiesofcellulargenes(pseudogenes e.g.intronless inactivatedgenes)
simplesequence
repeats
micro
,minisatellites
segmentalduplications
blocksoftandemly repeatedsequences(e.g.aroundcentromeres,telomeres,shortarmsofacrocentricchromosomes)
7/30/2019 Lecture Slides Lecture 2 Slides
52/54
Sequence
Variation Unrelatedindividualsare99.9%identicalat
the
DNA
sequence. Mostcommontypeofvariant=single
nucleotidepolymorphisms(SNPs)
GGATCTA GGAGCTACCTAGAT CCTCGAT
SNPrate
~1per
1,200
bp;
1%
of
them
affectproteinfunction.
7/30/2019 Lecture Slides Lecture 2 Slides
53/54
Gene
Prediction
Methods DirectevidenceoftranscriptionprovidedbyESTs
ormRNA.
Indirectevidenceofsequencesimilaritytoknown
genesand
proteins.
Abinitio recognitionofexonsusingHMMs Genescan,
Genie.
Estimated gene number from the draft
7/30/2019 Lecture Slides Lecture 2 Slides
54/54
Estimatedgenenumberfromthedraft
sequences:
35,000to40,000