View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Sequencing
Informatics
Gabor T. Marth
Department of Biology, Boston [email protected]
BI420 – Introduction to Bioinformatics
The nuclear genome (chromosomes)
The genome sequence
• the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)
Completed genomes
~1 Mb~100 Mb
>100 Mb
~3,000 Mb
Main genome sequencing strategies
Clone-based shotgun sequencing
Whole-genome shotgun sequencing
Human Genome Project Celera Genomics, Inc.
Hierarchical genome sequencing
BAC library construction
clone mapping
shotgun subclone library construction
sequencing
sequence reconstruction (sequence assembly)Lander et al. Nature 2001
Clone mapping – “sequence ready” map
Hierarchical genome sequencing
BAC library construction
clone mapping
shotgun subclone library construction
sequencing/read processing
sequence reconstruction (sequence assembly)Lander et al. Nature 2001
Shotgun subclone library construction
BAC primary clone cloning vector
sequencing vector
subclone insert
Hierarchical genome sequencing
BAC library construction
clone mapping
shotgun subclone library construction
sequencing/read processing
sequence reconstruction (sequence assembly)Lander et al. Nature 2001
Sequencing
Robotic automation
Lander et al. Nature 2001
Base calling
GGGCTCAGCTGTATCAGCCACGTGCCTACAACAATCTGCCCCT
Base calling
PHREDbase = AQ = 40
Vector clipping
Hierarchical genome sequencing
BAC library construction
clone mapping
shotgun subclone library construction
sequencing/read processing
sequence reconstruction (sequence assembly)Lander et al. Nature 2001
Sequence assembly
PHRAP
Repetitive DNA may confuse assembly
Sequence completion (finishing)
CONSED, AUTOFINIS
H
gapregion of low sequence coverage and/or quality
New sequencing technologies
From familiar ABI traces …
… and Solexa reads.… to 454 pyrograms …
100 x 1,000 bp
100 thousand x 100 bp50 million x 20 bp