20
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College [email protected] BI420 – Introduction to Bioinformatics

Sequencing Informatics Gabor T. Marth Department of Biology, Boston College [email protected] BI420 – Introduction to Bioinformatics

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Sequencing

Informatics

Gabor T. Marth

Department of Biology, Boston [email protected]

BI420 – Introduction to Bioinformatics

Page 2: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

The nuclear genome (chromosomes)

Page 3: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

The genome sequence

• the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)

Page 4: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Completed genomes

~1 Mb~100 Mb

>100 Mb

~3,000 Mb

Page 5: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Main genome sequencing strategies

Clone-based shotgun sequencing

Whole-genome shotgun sequencing

Human Genome Project Celera Genomics, Inc.

Page 6: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 7: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Clone mapping – “sequence ready” map

Page 8: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 9: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Shotgun subclone library construction

BAC primary clone cloning vector

sequencing vector

subclone insert

Page 10: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 11: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Sequencing

Page 12: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Robotic automation

Lander et al. Nature 2001

Page 13: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Base calling

GGGCTCAGCTGTATCAGCCACGTGCCTACAACAATCTGCCCCT

Page 14: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Base calling

PHREDbase = AQ = 40

Page 15: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Vector clipping

Page 16: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 17: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Sequence assembly

PHRAP

Page 18: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Repetitive DNA may confuse assembly

Page 19: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

Sequence completion (finishing)

CONSED, AUTOFINIS

H

gapregion of low sequence coverage and/or quality

Page 20: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

New sequencing technologies

From familiar ABI traces …

… and Solexa reads.… to 454 pyrograms …

100 x 1,000 bp

100 thousand x 100 bp50 million x 20 bp