Genome representation and variant identification

Preview:

DESCRIPTION

Genome representation and variant identification. Deanna M. Church, NCBI. The Reference Assembly is NOT Static. NCBI35 (hg17). NCBI36 (hg18). GRCh37 (hg19). GRCh37.p9. Image credit: http :// www.tohlejokes.com. http://genomereference.org. Resolved: 716 Open: 697. - PowerPoint PPT Presentation

Citation preview

Genome representation and variant identification

Deanna M. Church, NCBI

The Reference Assembly is NOT Static

NCBI35 (hg17)NCBI36 (hg18)GRCh37 (hg19)GRCh37.p9

Image credit: http://www.tohlejokes.com

http://genomereference.org

Resolved: 716Open: 697

http://www.ncbi.nlm.nih.gov/dbvar

Studies

Variant Regions

Variant Calls

Variant Region nsv531833 type: CNV

Variant Calls: nssv577112 type: copy number gain Method: Oligo aCGH Analysis: Probe signal intensity phenotype: Autism; etc. Clinical: Pathogenic Copy Number: 3

Variant Calls: nssv580124 type: copy number loss Method: Oligo aCGH Analysis: Probe signal intensity phenotype: Autism. Clinical: Pathogenic Copy Number: 1

MethodsAnalysis

PublicationsSamples

Submitted assembly

Variant Call Ambiguitystart stop

Inner start Inner stop

Outer start Outer stop

Probes with decreased signal intensityProbes with expected signal intensity

breakpoint breakpoint

Inner start Inner stop

Variant Call AmbiguityOuter start Outer stop

Fosmid clone (40 Kb +/- 1 Kb)

20Kb Clone has an insertionrelative to the genome

Clone has a deletionrelative to the genome 60 Kb

Assembly, Mis-assembly, Biology and Variant Interpretation

BAC insertBAC vector

Shotgun sequence

Assemble

GAPS

“finishers” go in to manually fill the gaps, often by PCR

NCBI36 (hg18)

GRCh

37 (h

g19)

NCBI35 (hg17)

GRCh37 (hg19)

AL139246.20

AL139246.21

Build sequence contigs based on contigs defined in TPF (Tiling Path File).

Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis

Switch point

Consensus sequence

NCBI36

nsv832911 (nstd68) Submitted on NCBI35 (hg17)

NCBI35 (hg17) Tiling Path

GRCh37 (hg19) Tiling Path

Gap Inserted

Moved approximately 2 Mb distal on chr15

NC_0000015.8 (chr15)

NC_0000015.9 (chr15)

Removed from assembly

Added to assembly

HG-24

Sequences from haplotype 1Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

AC074378.4AC079749.5

AC134921.2AC147055.2

AC140484.1AC019173.4

AC093720.2AC021146.7

NCBI36 NC_000004.10 (chr4) Tiling Path

Xue Y et al, 2008

TMPRSS11E TMPRSS11E2

GRCh37 NC_000004.11 (chr4) Tiling Path

AC074378.4AC079749.5

AC134921.1AC147055.2

AC093720.2AC021146.7

TMPRSS11E

GRCh37: NT_167250.1 (UGT2B17 alternate locus)

AC074378.4AC140484.1

AC019173.4AC226496.2

AC021146.7

TMPRSS11E2

nsv532126 (nstd37)

GRCh37

81 FIX Patches71 NOVEL Patches

GRCh37.p9

Dennis et al., 2012

1q32 1q21 1p21

1p21 patch alignment to chromosome 1

Finding the data

How dbVar* manages data

*and most other NCBI databases too

Object Method Analysis Clinical assertion

NCBI36 location

Etc…

nsv1000 Oligo aCGH Probe signal intensity

None Location Etc…

nsv2000 Sequencing Paired end analysis

None Location Etc…

nsv3000 Sequencing Read Depth

Benign Location Etc..

… … … … … …

Search Term

Variant submitted on NCBI35 (hg17)Failed to remap to NCBI36 (hg18)Successful remap to GRCh37 (hg19)

No results in ‘normal’ dbVar searchGenome Sensor predicts this is a location -> points to dbVar Genome Browser

Acknowledgements

dbVar

John LopezTim HefferonJohn GarnerChao ChenGeorge ZhouVictor Ananiev

NCBI

Collaborators

DGVaDGV

GRCNCBI

Valerie SchneiderNathan BoukHsiu-Chuan Chen

Collaborators

TGI-WUWTSIEBI

ISCANCBI Genomes, Viewers and Variation groups

Recommended