53
mic Duplications, Structural Variation Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3 rd ,2006, Frontiers in Genomics

Genomic Duplications, Structural Variation and Disease

  • Upload
    bryant

  • View
    68

  • Download
    0

Embed Size (px)

DESCRIPTION

Genomic Duplications, Structural Variation and Disease. Evan Eichler Howard Hughes Medical Institute University of Washington. April 3 rd ,2006, Frontiers in Genomics. Genomic Variation. Mutational mechanisms underlying genetic variation?. Sequence. - PowerPoint PPT Presentation

Citation preview

Page 1: Genomic Duplications, Structural Variation and  Disease

Genomic Duplications, Structural Variation and Disease

Evan EichlerHoward Hughes Medical Institute

University of Washington

April 3rd,2006, Frontiers in Genomics

Page 2: Genomic Duplications, Structural Variation and  Disease

Genomic Variation

• Single base-pair changes – point mutations

• Small insertions/deletions– frameshift, microsatellite, minisatellite

• Mobile elements—retroelement insertions (300bp -10 kb in size)

• Large-scale genomic variation (>10 kb)

– Large-scale Deletions

– Segmental Duplications

• Chromosomal variation—translocations, inversions, fusions.

Mutational mechanisms underlying genetic variation?

Cytogenetics

Sequence

Page 3: Genomic Duplications, Structural Variation and  Disease

Global Analysis of Segmental Duplications

Approaches:

• Computational a) Whole genome assembly comparison b) Whole genome shotgun sequence detection strategies

• Experimental Comparative sequence analysis, array comparative genomic hybridization, comparative FISH

Interchromosomal

Intrachromosomal

SegmentalDuplications

Question: What is the organization, mechanism and impact of recent human segmental duplications? >90% and > 1kb in length

Page 4: Genomic Duplications, Structural Variation and  Disease

12345678910111213141516171819202122XY

•Total: 5.26% (150.8 Mb)•Inter: 2.36% (67.6 Mb)•Intra: 3.87% (111.1 Mb)•Non-random distribution•5.3 fold bias to pericentromere•389 regions > 100 kb nexi

“Heterochromatic” regionsDuplications

100 Mb50 Mb 150 Mb

200 Mb 250 Mb

10 Mb

(build34, >90%, >1kb)

Recent Duplication Architecture of the Human Genome

Alpha Satellite

4p16.1

4p16.3

7q36

2p22

11p15

7q36

10q26

12q24Xq284q24

22q12

12p11

11q14

21q21 2p11 (700 kb) 11q14

4p16.1

Page 5: Genomic Duplications, Structural Variation and  Disease

Human Genome Segmental Duplication Pattern

chr1chr2chr3chr4chr5chr6chr7chr8chr9chr10chr11chr12chr13chr14chr15chr16chr17chr18chr19chr20chr21chr22chrXchrY

•~4% duplication• >20 kb, >95%•~4 average # duplicates•59.5% pairwise (> 1 Mb)

http://humanparalogy.gs.washington.eduShe, X et al., (2004), Nature

Page 6: Genomic Duplications, Structural Variation and  Disease

•1-2% duplication• >20 kb, >95%•2-3 average # duplicates•July 2004, mmu5

Mouse Segmental Duplication Pattern

She, X in press

Page 7: Genomic Duplications, Structural Variation and  Disease

Whole-Genome Analysis (2,865 Mb)Build 34, July 2003, 25.8 K alignmentsPercent Identity (%)

Percent Similarity of Human Segmental Duplications

Sum of AlignedBases (kb)

InterchromosomalIntrachromosomal

2000

4000

6000

8000

10000

12000

5000

10000

15000

0

20000

90

90

.5 91

91

.5 92

92

.5 93

93

.5 94

94

.5 95

95

.5 96

96

.5 97

97

.5 98

98

.5

99

99

.5

10

0

0

12 My 5 My25My

49 Mb

Page 8: Genomic Duplications, Structural Variation and  Disease

Human

Chimpanzee

24.8 Mb+ new6.6 Mb+ shared

21.7 Mb+ new7.2 Mb+ shared

16.0 Mb+ sharedChimp hyperexpansion

Polymorphism 15-20%

Summary: Segmental Duplication Asymmetry

•76.3 Mb of Differentially Duplicated Euchromatic Material

Page 9: Genomic Duplications, Structural Variation and  Disease

Hyperexpansion of a

Chimpanzee

Segmental Duplication.

4>>>>>400 copies

Cheng, Z et al., (2005), Nature

Page 10: Genomic Duplications, Structural Variation and  Disease

Human Segmental Duplications Properties

• Large (>10 kb)

• Recent (>95% identity)

• Interspersed (60% are separated by more than 1 Mb)

• Modular (duplicon architecture) ~389 acceptor regions

• 2.7% Genetic Difference, human vs. chimpanzee

What impact in terms of human variation?

Page 11: Genomic Duplications, Structural Variation and  Disease

Models of Disease

• Rare Duplication-mediated Structural Variation

• Common Fine-Scale Structural Variation

• Rare Duplication-Mediated Structural Variation

Page 12: Genomic Duplications, Structural Variation and  Disease

Genomic Disorders

TELA B C

TELA B C

Aberrant Recombination

TEL

A B CTEL

A B C

Human Disease

GAMETES

Triplosensitive, Haploinsufficient and Imprinted Genes

•Hypothesis: Mechanism underlying Uncharacterized Mental Retardation?

Page 13: Genomic Duplications, Structural Variation and  Disease

Genomic Disorder BrainCongenital Anomalies

LocusInterva

l kbLCR

size kbDuplicon

%identity

Incidence (%)

Incidence(MR)

Williams-Beuren syndrome Severe MRcraniofacial, heart disease

7q11.23 1,600 >320PMS2/GTFI2

96-99 0.01 0.5

Prader-Willi syndrome Severe MRsmall hands, feet, hypotonia, obesity, short stature

15q11.2-q13 3500 400 HERC2 92-99 0.007 0.35

Angelman syndrome Severe MRmicrocephaly, hyoptonia, seizures

15q11.2-q13 3500 400 HERC2 92-99 0.007 0.35

Smith-Magenis syndrome Severe MRcrainiofacial, peripheral neuropathy

17p11.2 4000 200 SMSREP 98.2-99 0.004 0.2

dup17p11.2 mild MRperipheral neuropathy

17p11.2 4000 200 SMSREP 98.2-99 0.001 0.05

Velocardiofacial syndrome mild MR cardiac, craniofacial defects

22q11.2 ~3000 ~300 LCR22 98-99 0.03 0.7

Cat Eye Syndrome Severe MRcraniofacial,coloboma

22q11 3000 400 LCR22 98-99 0.003 0.15

Inv dup(15) Mild/Severe mild facial, seizures

15q11/q14 4000 400 HERC2 98 0.01 0.5

Neurofibromatosis Mild MR fibromatous tumours, visual defects

17q11.2 1500 85 NF1REP 98.4 0.003 0.03

CMT1A no MRperipheral neuropathy

17p12 1400 24CMT1A-REP

98.7 0.01 NA

HNPP no MRperipheral neuropathy

17p12 1400 24CMT1A-REP

98.7 0.001 NA

0.089 2.80%

Duplication-Mediated Disease

Page 14: Genomic Duplications, Structural Variation and  Disease

•130 candidate regions (298 Mb)•23 associated with genetic disease•Target patients array CGH

Duplication Map of Human Genome

Bailey et al. (2002), Science:293:1003-1007

Page 15: Genomic Duplications, Structural Variation and  Disease

Array Comparative Genomic Hybridization

•High-throughput detection of large-scale variation (>50 kb),LCV or CNP= Deletions and Duplications (Iafrate et al., 2004;Sebat et al., 2004).

12 mm

Array of Human

BAC Clones

Hybridization

Normal Normal Human DNA Human DNA

SampleSample

Disease Disease individualindividual

DNA SampleDNA SampleMerge

Cy3 ChannelCy3 Channel

Cy5 ChannelCy5 Channel

Page 16: Genomic Duplications, Structural Variation and  Disease

Duplication Microarrary: Experimental Design

TEL

BACs

•130 regions of the human genome •2178 BACs or on average ~10-12 BACs per region•Perform ArrayCGH—reciprocal dye swap

experiments•Strategy: Identify normal variation and then

search for variation only observed in disease

patients

dist: >50 kb<5 Mb prop: 95% identity, 10 kb

Page 17: Genomic Duplications, Structural Variation and  Disease

Hybridization

R921

-1.5

-1-0.5

0

0.5

1

1.5

2

5 10 15 20D3767

-1.5

-1

-0.5

0

0.5

1

1.5

05 10 15 20

R1080

-2

-1.5

-1

-0.5

0

0.5

1

1.5

0 5 10 15 20

Log2 HybridizationRelative Intensity

BAC Probes

1-34-56

7-14

1516-20

Page 18: Genomic Duplications, Structural Variation and  Disease

Study Populations

•Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs.

•Idiopathic Mental Retardation: Target =900 samples; (400 samples Flint, 500 CWRU samples); 291 complete

Page 19: Genomic Duplications, Structural Variation and  Disease

Normal Large-Scale Genomic Structural Variation

•Based on our analysis of ~568 chromosomes (~40/130 hotspots show no variation)—NAHR resistant or selection?

Page 20: Genomic Duplications, Structural Variation and  Disease

Validation using Nimblegen Arrays

Duplication

Deletion

Locke et al., unpublished

Page 21: Genomic Duplications, Structural Variation and  Disease

Deletion Variants Appear Less Common

Page 22: Genomic Duplications, Structural Variation and  Disease

Study Populations

•Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs.

•Idiopathic Mental Retardation: Target =900 samples; (400 samples Flint, 500 CWRU samples); 291 complete

Page 23: Genomic Duplications, Structural Variation and  Disease

~3.0 Mb deletion observed in IMR26 (=common VCF 22q11 deletion)

VCF Deletion detected in IMR26

Page 24: Genomic Duplications, Structural Variation and  Disease

CNP detected by Seg Dup array and Iafrate et al.

CNPs detected by Seg Dup array in HapMap samples

Novel ~2.5Mb deletion only observed in IMR

Novel LCV/CNP Detected in IMR43Novel LCV/CNP Detected in IMR43

Sharp et al., unpublished

Page 25: Genomic Duplications, Structural Variation and  Disease

Novel 2.5Mb Chr1 deletion in IMR43

Page 26: Genomic Duplications, Structural Variation and  Disease

Variation in IMR

•7/9 events are de novo

•New Genomic Disorder Candidates

•23 (n=31 patients) novel sites of variation defined by >2 BACs

•291 IMR samples (Oxford Cohort) screened to date

Table: Novel Structural Variants and Mental Retardation

Event Size (Mb) # of Patients De Novo17q deletion 0.45 4 Yes1p deletion 1.4 3 NN15q deletion* 3.3 2 NN19q deletion 0.4 2 Yes2p deletion 0.7 2 Yes1q deletion 2.5 1 Yes16p duplication 2.3 1 NN22q deletion** 2.4 1 Yes16p duplication 2.8 1 NoXq duplication 0.2 1 No13p duplication 0.2 1 NN22q duplication 3 1 NN17p duplication 3.8 1 Yes22q duplication 0.3 1 NN2p deletion 0.6 1 1P10p deletion 3 1 NN12p deletion 0.4 1 1P15q duplication 4 1 NN13q duplication 1.5 1 NN6p duplication 6.1 1 1P6q duplication 6.2 1 Yes1p duplication 0.6 1 NN1p duplication 0.9 1 NN*Prader-Willi Angelman Syndrome/**VCF/DGS; NN=Not known

•5 are seen in morethan one unrelated patient

Page 27: Genomic Duplications, Structural Variation and  Disease

Problems:

•Array CGH has a lower limit to detect deletions (~30 kb)

•Oligo-based approaches effectively sample a smallfraction of the genome and extrapolate size indirectly

2. Neither can identify subtle (5-30 kb) variation

3. Neither approach can detect inversions.

1. Precise location of the rearrangement is unknown.

4. Location and structure of the change unknown

Page 28: Genomic Duplications, Structural Variation and  Disease

Models of Disease

• Rare Duplication-mediated Structural Variation

• Common Fine-Scale Structural Variation

Page 29: Genomic Duplications, Structural Variation and  Disease

SMA susceptibility88.7/99.8%>100 kb5q1350% +++/-DuplicationSMN2

nicotine metabolism24kb/96.2%7 kb19q13.21.3% +/-DuplicationCYP2A6

Congenital drenal hyperplasia035 kb6p21.31.6% +/-DuplicationCYP21A2

antidepressant resistance5.4kb/91-97%5 kb22q13.11-29% +++DuplicationCYP2D6

toxin resistance, cancer susceptibility24kb/95.6%18 kb1p13.350% -/-DeletionGSTM1

immune response91-97%Variable14q32.34-15% +/-Deletion/DupIGVH26

none48kb/99%219 kbXq2833% -/+InversionEMD/FLN

heart defect susceptibility400kb/98.9%5 Mb8p2326% -/+InversionDEF3A-OR

halothane/epoxide sensitivity17kb/94%54.3 kb22q11.220% -/-DeletionGSTT1

PhenotypeDupSizeLocusFreq.TypeGene

Intermediate-Size Structural Variation (ISV)and Inversions

Adapted from Buckland, Ann Med

Page 30: Genomic Duplications, Structural Variation and  Disease

Comparing Human Genomes by Paired-End Sequence•~1.1 million fosmid paired-ends were sequenced by MIT to facilitate gap closure during final phases of HGP

•Fosmid insert size tightly distributed around mean (40 +/- 2.6 kb), low copy=stability; capillary sequencing=low mispairing rate

•Derived from a single female donor PDR cell line

•Approach: optimal placement of fosmid ends against human genome could theoretically detect rearrangements:

Inversions

< <

Insertion

> <

Deletion

> <

Concordant

> <

Build35

Fosmid

Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage)

Page 31: Genomic Duplications, Structural Variation and  Disease

< 32 kb Putative Insertion

>48 kb Putative Deletion

discordant by orientation(yellow/gold)

discordant size(red)

duplicationtrack

a) Insertion

Deletion

Inversion

b)

c)

Structural polymorphisms?

Genome-wide Detection of Structural Variation (>8kb)

Page 32: Genomic Duplications, Structural Variation and  Disease

GSTM1 ~ 20 kb deletion•minspread 28 kb (9 fosmids)•50% of Caucasians/Saudis are -/- for 18 kb gene (predisposition to cancer)•+++ ultrarapid GSTM1 activity

Validated Structural Polymorphisms

CYP2D6 ~ 5-10 kb insertion•Minspread 17 kb (7 fosmids)•Alternate haplotype support•1-29% Caucasians/Japanese have •multiple copies (entire gene ~5 kb)•Associated with resistance to antipsychotic tricyclic antidepressants

GSTM1

CYP2D6

Page 33: Genomic Duplications, Structural Variation and  Disease

Summary: 6/16 of common polymorphisms detected

Table 2: Validated Structural Polymorphisms.. Gene Type Frequency Locus Expected Detected PhenotypeGSTT1 Deletion 20% -/- 22q11.2 54.3 kb 40 kb halothane/epoxide sensitivityEMD/FLN Inversion 33% -/+ Xq28 219 kb 34 kb noneGSTM1 Deletion 50% -/- 1p13.3 18 kb 17.8 kb toxin resistance, cancer susceptibilityCYP2D6 Duplication 1-29% ++++ 22q13.1 5kb X n 10 kb antidepressant sensitivityCYP21A2 Duplication 1.6% +/- 6p21.3 35 kb 22.1 kb Congenital drenal hyperplasiaLPA VNTR 94% H 6q27 5.5*n kb 14 kb Coronary heart disease riskRHD Deletion 15-20% -/- 1p36.11 ~60 kb 67.8 kb Rhesus blood group sensitivity

Tuzun et al. (2005) Nat. Genet

Page 34: Genomic Duplications, Structural Variation and  Disease

……Sequence the Structural Variation

Page 35: Genomic Duplications, Structural Variation and  Disease

Putative Insertion (8,384 bp)

build34

fosmid

Page 36: Genomic Duplications, Structural Variation and  Disease

Putative Deletion (14,055 bp)

build34

fosmid

Page 37: Genomic Duplications, Structural Variation and  Disease

 Variant Type

Event Fosmids* Confirmed**Invariant Binary Tandem CNV ComplexDeletion 74 69 5 42 12 15Insertion 49 37 12 16 12 9Inversion 27 22 5 10 0 12Total 150 128 22 68 24 36

G248 Fosmid Sequencing Results.

SIGLEC5A

LSP1 TNNT3KCNJ2KCNJ16

GSST2GSST2 DDT

MEGF11

b35

fosmid

b35

fosmid

b35

fosmid

b35

fosmid

b35

fosmid

b35

fosmid

a)

c)

e)

b)

d)

f)

Sequencing Genic Structural Variation

Page 38: Genomic Duplications, Structural Variation and  Disease

Gene Families and Structural Variants

Drug detoxification: glutathione-S-transferase, cytochromeP450, carboxylesterases

Immune response and inflammation: leukocyte immunoglobulin-like receptor, defensin, phorbolin

Surface integrity genes: mucin, enamelin, late epidermal cornified envelope genes, galectin

Surface antigens: melanoma antigen gene family, rhesus antigen

Environmental Interaction Genes.

Page 39: Genomic Duplications, Structural Variation and  Disease

Fine-Scale Structural Variation Map: (build35 vs. Fosmids)

•1.3% Discordant Fosmids•Identify 295 clusters (2 or more)•246 supported by second haplotype •147 inserts, 93 deletions, 57 inverts•18 putative L1 events—10 deletionsand 8 insertions (6 kb insertion)•89 locate within gene regions.•138 unique regions of the genome•159 duplicated regions of the genome

“Heterochromatic” regions

DeletionInversions

Insertion(Fosmid)

“Duplicated” regions

Page 40: Genomic Duplications, Structural Variation and  Disease

PCR Breakpoint Genotyping Assays for Structural Variation

•Tested 11 structural variants (5 insertions, 4 deletions, and 2 inversions)•7 successful assays (6 >20% minor allele frequency)

Page 41: Genomic Duplications, Structural Variation and  Disease

Illumina Golden-Gate Genotyping Assays for Structural Variation

Page 42: Genomic Duplications, Structural Variation and  Disease

CEPH

Order DNA Population Gender Family ENCODE1 NA18516 Yoruba F Y013 yes2 NA18507 Yoruba M Y009 yes3 NA18992 Japan F - yes4 NA19240 Yoruba F Y117 -5 NA18555 China F - yes6 NA12878 CEPH F 1463 -7 NA19129 Yoruba F Y077 -8 NA12156 CEPH F 1408 yes9 NA18502 Yoruba F Y004 yes

NA0

6991

CEU

NA18515YRI

NA

19143YR

I

0.1

Yoruba

Japanese andChinese

Human Genome Structural Variation Project•2 scientific meetings (2005)•2 working groups (AHG, MSWG (12/05)•Coordinating Committee (1/06)•NIH Council (2/06)•Press Release (3/15/06)

•Goal: Complete Characterization of Structural Variation in 48 HapMap Samples

Page 43: Genomic Duplications, Structural Variation and  Disease

Sample Sex Library Ethnicity Total SVs Insertions Deletions Inversions

NA15510 Female G248 unknown 297 139 102 56

NA18507 Female ABC7 Yoruban 380 162 167 51

Combined NA NA NA 621 283 238 110

Comparison of Detected "Fosmid" Variants from Two Individuals

*Combined represents the total number of non-overlapping variants.

Detected Variants from Two Individuals.

Page 44: Genomic Duplications, Structural Variation and  Disease

Complementary Approaches

•1503 variants, 115 Mb, 800 genes structurally variant

Eichler (2006) Nat. Genet

Page 45: Genomic Duplications, Structural Variation and  Disease

Summary•Humans relatively unique in size, proportion andarchitecture of interspersed segmental duplications

•Large-Scale Variation •Normals: Identified 257 CNPs using a targeted

microarray to duplicated regions•IMR: Identified 23 sites (>2 BACs) unique to patients

(n=291 probands) (5 are recurrent and 7 are confirmed de novo)Novel Genomic Disorders

•Fine-Scale Variation: Developed an approach to map andsequence common fine-scale variation within the human Population, estimate ~200-300 differences > 8 kb between 2individuals.

Page 46: Genomic Duplications, Structural Variation and  Disease
Page 47: Genomic Duplications, Structural Variation and  Disease

Models of Human “Genetic” Disease

1) Simple Mendelian --one gene-one disease, familial, highly penetrant, small fraction of pop. Eg. cystic fibrosis

2) Chromosome Disease –large chromosomal regions, non-familial, sporadic, relatively high frequency Eg. Turner Syndrome

3) Genomic Disease –familial and/or recurrent, deletion or duplication of large # of genes, dosage effects. Eg. Prader-Willi Syndrome.

4) Complex Traits--multiple genes plus environment, familial, variably

penetrant, large fraction of population, susceptibility genes eg. hypertension.

Page 48: Genomic Duplications, Structural Variation and  Disease

Acknowledgements

CWRU/UChicagoStuart SchwartzLaurie Christ

Eichler LabEray TuzunAndy SharpDevin LockeMatthew JohnsonZhaoshi JiangJon BleyhlSean McGrathTera NewmanJeff BaileyAnne MorrisonLisa PertzZe ChengXinwei SheJames Sprague

UWGSCMaynard OlsonRajinder KaulHillary HaydenEric Haugen

AgencourtDoug Smith

OxfordJonathan FlintSamantha Knight

NHGRIJim Mullikin

UCSFDan PinkelDonna Albertson

UWDebbie NickersonMark RiederChris CarlsonJosh Smith

Page 49: Genomic Duplications, Structural Variation and  Disease

……Finding Novel Human Sequence

Page 50: Genomic Duplications, Structural Variation and  Disease

Kaul et al, unpublished

Sequence of Traversing Fosmid Fills Gaps

Page 51: Genomic Duplications, Structural Variation and  Disease

Singleton Fosmids Extend into Gaps

Kaul et al, unpublished

Page 52: Genomic Duplications, Structural Variation and  Disease

Fosmid Pairs that fail to Map to build35

• 4773 fosmid paired-end sequences fail to map to build 35.– 1613 have 150 bp >Q30 at either end and have >100 bp unique

seq• 1416 of these have no hit to HTGS BAC sequence• 1503 BLAST hit chimpanzee WGS but only 403 within chimp assembly• Estimate that represents ~10-20 Mb.

• 1503 of these selected for fingerprinting (4 enzymes). • Four independent restriction enzymes (EcoR I, Hind III, Bgl II and

Nsi I )• Contigs constructed from 1376 clones (95% success rate) using

Composite Mutual Overlap Statistic (CMOS)

Page 53: Genomic Duplications, Structural Variation and  Disease

FISH Summary of Orphan Fosmids

•52 contigs tested by FISH•15 subtelomeric, 5 acrocentric and 5 pericentromeric•22 interstitial euchromatin (9 corresponding to known gaps)•10 contigs =no signals observed against 2 individuals (6/10 largest)