19
Genome sequencing

Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Genome sequencing

Vocabulary

• Bac: Bacterial Artificial Chromosome: cloning vector for yeast

• Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli

• Library: collection of fragments of a genome in cloning vectors

• Draft: crude 1st generation sequence assembly• Scaffold: Sequences which are anchored to a

genetic map

Vocabulary 2

• Minimal tiling path: Minimal set of overlapping clones that together provides complete coverage across a genomic region

• Coverage: The number of times a genomic region is represented in a collection of clones or sequence reads

• Contig: Alignment of overlapping reads• 'N50 length‘ is defined as the largest length L

such that 50% of all nucleotides are contained in contigs of size at least L

Bac by Bac Whole genome shotgun

Bac by Bac sequencing (slow)

Minimal tiling path

Whole genome shotgun sequencing

WGSA

Hybrid shotgun sequencing

N50

Cum

ulat

ive

cont

ig c

onte

nt

in %

of

geno

me

0400

50

100

Contig size (in kb)

Order contigs according to sizeCompute cumulative sizeN50 = contig size (sequence length) which marks 50% of genome content

100 1000

Human genome

• 2001: 2 Draft sequences published• Public Bac by Bac sequence• Celeras WGSA

– 90% of euchromatic sequence– 150.000 gaps– N50: 81 kb– Error rate: 1:10.000

• 2004 Finished public sequence– 99 % of euchromatic sequence– 341 gaps– N50: 38.500 kb– Error rate: 1:100.000

The problem with complex genomes

• Gaps

• Orientation of contigs not known

• Near identical repeats hard to resolve

Finishing the sequence

Gap Draft sequence

Resolving repeats

Detecting and resolving repeats in WGSA

Clone orientation

Segmental duplications / gaps

Blue: duplications of size > 10kbRed: Gaps of size > 300 kb