25
ASHG - GRC Workshop Tina Lindsay ASHG Oct 18, 2014

Ashg grc workshop2014_tg

Embed Size (px)

DESCRIPTION

Using CHM1 by Tina Graves

Citation preview

Page 1: Ashg grc workshop2014_tg

ASHG - GRC Workshop

Tina Lindsay

ASHG Oct 18, 2014

Page 2: Ashg grc workshop2014_tg

The Human Reference is Not Complete

• Reference has been found to not be optimal in some

regions

• Structural variation makes it difficult to assemble a truly

representative genome when using a diploid sample

• Some regions were recalcitrant to closure with technology

and resources available at the time

• Additional sequences are needed to capture the full range

of diversity in humans

Page 3: Ashg grc workshop2014_tg

AC074378.4

AC079749.5

AC134921.2

AC147055.2

AC140484.1AC019173.4

AC093720.2AC021146.7

NCBI36 NC_000004.10 (chr4) Tiling Path

Xue Y et al, 2008

TMPRSS11E TMPRSS11E2

GRCh37 NC_000004.11 (chr4) Tiling Path

AC074378.4

AC079749.5

AC134921.1

AC147055.2

AC093720.2

AC021146.7

TMPRSS11E

GRCh37: NT_167250.1 (UGT2B17 alternate locus)

AC074378.4

AC140484.1

AC019173.4

AC226496.2

AC021146.7

TMPRSS11E2

UGT2B17 – Conflicting Alleles

G

A

P

Page 4: Ashg grc workshop2014_tg

Allelic Diversity vs. Segmental Duplication

A

A

C

T

C

G

C

C

Repeat Copies (noted by color difference)

Allelic

Copies

Diploid Genome

With a diploid genome, there is significant ambiguity sorting allelic copies from repeat copies

A C C C

Haploid Genome

Repeat Copies (ONLY but noted by color difference)

With a haploid genome, allelic differences are eliminated, and base differences are likely

indicative of repeat copies

Page 5: Ashg grc workshop2014_tg

Hydatidiform mole

1. Fertilization of an oocyte without a nucleus

2. Post-zygotic diploidization of triploid zygotes

23x23X

Oocyte Androgenetic HM

23X 23X

?

Page 6: Ashg grc workshop2014_tg

Initial Use Of CHM1 Source

• CHORI-17 BAC Library

• CHORI-17 BAC end sequences (n=325,659)

• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs)

• CHORI-17 BACs

• > 750 have been sequenced

• 590 of them in Genbank as phase 3

Page 7: Ashg grc workshop2014_tg

SRGAP2 Homology between genes

Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs

Shows homology between SRGAP2B and SRGAP2C

Dennis, et.al. 2012

SRGAP2A

SRGAP2B

SRGAP2C

Page 8: Ashg grc workshop2014_tg

1q21

1q21 patch alignment to chromosome 1

1q32 1q21 1p21

Page 9: Ashg grc workshop2014_tg

IGH Region Highlights Allelic Differences

Watson, et. al., 2013

Page 10: Ashg grc workshop2014_tg

Williams-Beuren Syndrome region

Slide courtesy of Megan Dennis

Page 11: Ashg grc workshop2014_tg

Current status of CHM1 resources

• CHORI-17 BAC Library (created from CHM1 cell line)

• CHORI-17 BAC end sequences (n=325,659)

• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs)

• CHORI-17 BACs (>750 have been sequenced, with 592 of them in

Genbank as phase 3)

• Active cell line

• >100X coverage Illumina 100bp reads

• 300, 500bp, 3kb inserts

• Reference assisted assembly CHM1_1.1

• BioNano genome map

• >50X coverage of PacBio long read data

Page 12: Ashg grc workshop2014_tg

CHM1_1.1 Assembly

• Reference-guided assembly – SRPRISM v2.3, R. Agarwala

• Alignment of Illumina reads to GRCh37 primary assembly

• CHORI-17 BAC clone tilepaths were then incorporated

• 428 total clones

• 324 clones in 45 tilepaths

• 104 clones as singletons

• Comparison back to GRCh37 reference to provide appropriate gaps sizes

• Assembly submitted to Genbank

• http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2

• Paper to be published soon • Genome Research (in press)

• biorxiv doi (doi: http://dx.doi.org/10.1101/006841)

Page 13: Ashg grc workshop2014_tg

CHM1_1.1 Assembly

Total Sequence Length 3,037,866,619 bp

Total Assembly Gap Length 210,229,812 bp

Number of Scaffolds 163

Scaffold N50 50,362,920 bp

Number of Contigs 40,828

Contig N50 143,936 bp

CHM1_1.1

GRCh3

7

Page 14: Ashg grc workshop2014_tg

Incorporation of CHM1_1.1 Assembly Data in GRCh38

Page 15: Ashg grc workshop2014_tg

PacBio CHM1 Assembly potentially fills GRCh38 Gaps

GRCh38

PacBio CHM1

Page 16: Ashg grc workshop2014_tg

PacBio CHM1 Assembly Shows Data Not in GRCH38

GRCh38

PacBio CHM1

Second Pass Alignment

Page 17: Ashg grc workshop2014_tg

CHM1 BioNano Genome Map Aligned to GRCh38

GRCh38

CHM1 BioNano Map~15kb additional data

Page 18: Ashg grc workshop2014_tg

BioNano SV Calls Identified a Assembly Problems

Collapse

Expansi

on

in A

ssem

bly

Gap in SequenceCHM1_1.1 Assembly

CHM1 BioNano Map

Page 19: Ashg grc workshop2014_tg

Collapse in Sequence Data

Thought to be missing ~100kb in sequenced clones

GRCh38

Page 20: Ashg grc workshop2014_tg

Gap Sizing

Chr8 – Stalled Gap

Estimated at ~150kb

Sized using CHM1 Genome Map - >500 Kb

GRCh38

Page 21: Ashg grc workshop2014_tg

Future of CHM1 Assembly

• Plan to make as contiguous and accurate as possible

• Incorporate PacBio assembly where possible

• Additional CH17 clones being sequenced through

segmentally duplicated and structurally variant regions to

provide local assembly benefits (isolates the repeats)

Page 22: Ashg grc workshop2014_tg

CYP2D6 – Providing Alternate Alleles

ABC7

ABC8

ABC9

ABC11

(NA18517)

(NA18507)

(NA18956)

(NA18555)

Page 23: Ashg grc workshop2014_tg

Future Directions

• Continued Improvement on CHM1 Genome• Integration of Pacific Bioscience whole genome assembly

• BioNano genome map data

• Continue to add diversity to the reference by sequencing new samples that provide additional diversity than what is currently represented in GRCh38

• Continued sequencing of CH17 single haplotype BAC tilepaths to better represent segmentally duplicated regions

• Additional collaborations with the community to develop tools to more fully utilize the full reference assembly (alternate haplotypes)

Page 24: Ashg grc workshop2014_tg

Acknowledgements

The Genome Institute at Washington

University in St. Louis

Rick Wilson

Bob Fulton

Wes Warren

Karyn Meltz Steinberg

Vince Magrini

Derek Albracht

Milinn Kremitzki

Susan Rock

Debbie Scheer

Aye Wollam

The Finishing and Bioinformatics Teams

at The Genome Institute

University of Washington

Evan Eichler

Megan Dennis

Xander Nuttler

NCBI

Richa Argwala

Valerie Schneider

University of Pittsburgh

School of Medicine (CHM1 cell line)

Urvashi Surti

Personalis

Deanna Church

BioNano Genomics

Pacific Biosciences

UCSF

Pui-Yan Kwok

Yvonne Lai

Chin Lin

Catherine ChuCHORI

Pieter de Jong

Page 25: Ashg grc workshop2014_tg