35
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Embed Size (px)

Citation preview

Page 1: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Copy Number Variation

Eleanor Feingold

University of Pittsburgh

March 2012

Page 2: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

What do we mean by “copy number variation?”

GCTCATATATATTTG

kb - Mb (gene or gene region)

Page 3: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Copy number variation in a gene or gene region

“normal”

duplication of one gene

duplication of several genes

deletion

duplication of part of a gene

Page 4: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

WhatFind chromosomal segments (usually large ones) that are duplicated and/or deleted in tumor cell lines

WhyLearn something about cancer biology

or

Implications for treatment and prognosis

Cancer genetics Clinical pediatrics

WhatDetect inherited or de novo deletions in individuals

Why“Diagnose” birth defects

Classical copy number study types

Page 5: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

And now:Genetic association studies for CNVs

1) Collect cases and controls.

5

2) “Genotype” everyone at a CNV.

20

4

0

21

1

11

3 16

24

1

02

3) Test genotype/phenotype association.

0 1 2+

cases 65 133 202

controls 16 81 316

Page 6: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

How do we assay copy number variation?

Page 7: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

What

Microarray of clones (e.g. BACs)

Usually on glass slide

Competitive hybridization of test and reference samples.

Measure fluorescence ratio clone by clone.

Limitations

Large clones.

Sparse coverage.

High noise due to spotting process.

Generation 1 - Array CGH

Page 8: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

What

High-throughput SNP genotyping platforms (e.g. Affymetrix, Illumina) Disadvantages

Technology was never intended for measuring copy number.

SNPs on chip selected to avoid CNV regions by design.

Generation 2 - SNP chips

Advantage

Hundreds of thousands of points of info.

Page 9: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Advantages

SNPs in known CNV regions are now included.

Also have “non-polymorphic SNPs” (SNs?)

Generation 3 - SNP chips with CNV markers(Affy 6.0, Illumina 1M)

Affymetrix

200K probes in 5K known large CNV regions

700K probes “evenly spaced along the genome”

Illumina

1M markers in 10K regions of various types and sizes

Page 10: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Changes

Got rid of the non-polymorphic markers.

Special coverage of CNV regions???

Are these better or worse for CNVs than the previous generation?

Generation 4 -(Illumina 2.5M, 5M)

Page 11: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

What data do these technologies give us, and how do we use it?

Page 12: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

BB

AB

AA

Standard genotyping

Genotype information is in the angle (relative intensity of the two alleles).

Copy number information is in the distance from the origin (total intensity).

Page 13: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

AAA AAB

ABB

BBB

AAAB

BBA

Bnull

In theory

Page 14: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

AAA and AA

AAB

AB

ABB

BBB and BB

But when you look at the data …

trisomic(DownSyndrome)

disomic

Page 15: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

tota

l int

ensi

ty

total intensity(disomic)

tota

l int

ensi

ty (

tris

omic

)

trisomic

disomic

All SNPs on chromosome 21

Page 16: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

AAA AAB

ABB

BBB

AAAB

BBA

Bnull

In theory

Page 17: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

A

Bnull

In practice

Page 18: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

So how are copy numbers called?

Look for runs of SNPs that are high or low in intensity

Many available algorithmse.g. HMM, CBS, change-point

Page 19: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Basic picture

Page 20: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Komura et al.

GenomeResearch2006

Page 21: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

More complex examples (cancer genetics)

Peiffer et al. Genome Research, 2006

Page 22: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Angle (genotype info)

total intensity

amplification

AA

AB

BB

Page 23: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

deletiondeletion

Page 24: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Extra copy of whole chromosome

total intensity high over

whole chromosome

3 genotype groups

Page 25: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

No copy number change, but a region of homozygosity (LOH)

LOH

Page 26: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Basic picture

Wang et al. Genome Research, 2007

Page 27: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012
Page 28: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012
Page 29: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

29

Chromosome 9

Page 30: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

A few statistical issues to think about …

(there’s still a lot to do)

Page 31: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Many run-calling algorithms are oriented towards clinical applications.

Many CNV detection algorithms are very conservative - aim for zero false positive rate.

Most use normalization methods that assume a large reference population is not available.

Many use models that make assumptions about what kinds of variation are likely (e.g. cancer).

Page 32: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Family data should be modeled together.

CNV “calls” will be much more accurate if you use the whole family, but the model you use should depend on whether you are expecting de novo mutations or not.

For some diseases you’ll expect associations with de novo changes. For others you might expect inherited variants.

Page 33: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

How do we group CNVs for association testing?

deletion

deletion

deletion deletion

duplication

Page 34: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

Separate methods for deletions?

Deletions are easier to detect than other changes.

Deletions are likely to have simpler biological effects.

Page 35: Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012

The most important one …

The technology is still NOT intended for reliably and comparably measuring total intensity!

Total intensity numbers are very sensitive to DNA source, sample handling, etc., so extreme measures must be taken to ensure that cases and controls are comparable.