View
214
Download
2
Category
Tags:
Preview:
Citation preview
Copy Number Variation
Eleanor Feingold
University of Pittsburgh
March 2012
What do we mean by “copy number variation?”
GCTCATATATATTTG
kb - Mb (gene or gene region)
Copy number variation in a gene or gene region
“normal”
duplication of one gene
duplication of several genes
deletion
duplication of part of a gene
WhatFind chromosomal segments (usually large ones) that are duplicated and/or deleted in tumor cell lines
WhyLearn something about cancer biology
or
Implications for treatment and prognosis
Cancer genetics Clinical pediatrics
WhatDetect inherited or de novo deletions in individuals
Why“Diagnose” birth defects
Classical copy number study types
And now:Genetic association studies for CNVs
1) Collect cases and controls.
5
2) “Genotype” everyone at a CNV.
20
4
0
21
1
11
3 16
24
1
02
3) Test genotype/phenotype association.
0 1 2+
cases 65 133 202
controls 16 81 316
How do we assay copy number variation?
What
Microarray of clones (e.g. BACs)
Usually on glass slide
Competitive hybridization of test and reference samples.
Measure fluorescence ratio clone by clone.
Limitations
Large clones.
Sparse coverage.
High noise due to spotting process.
Generation 1 - Array CGH
What
High-throughput SNP genotyping platforms (e.g. Affymetrix, Illumina) Disadvantages
Technology was never intended for measuring copy number.
SNPs on chip selected to avoid CNV regions by design.
Generation 2 - SNP chips
Advantage
Hundreds of thousands of points of info.
Advantages
SNPs in known CNV regions are now included.
Also have “non-polymorphic SNPs” (SNs?)
Generation 3 - SNP chips with CNV markers(Affy 6.0, Illumina 1M)
Affymetrix
200K probes in 5K known large CNV regions
700K probes “evenly spaced along the genome”
Illumina
1M markers in 10K regions of various types and sizes
Changes
Got rid of the non-polymorphic markers.
Special coverage of CNV regions???
Are these better or worse for CNVs than the previous generation?
Generation 4 -(Illumina 2.5M, 5M)
What data do these technologies give us, and how do we use it?
BB
AB
AA
Standard genotyping
Genotype information is in the angle (relative intensity of the two alleles).
Copy number information is in the distance from the origin (total intensity).
AAA AAB
ABB
BBB
AAAB
BBA
Bnull
In theory
AAA and AA
AAB
AB
ABB
BBB and BB
But when you look at the data …
trisomic(DownSyndrome)
disomic
tota
l int
ensi
ty
total intensity(disomic)
tota
l int
ensi
ty (
tris
omic
)
trisomic
disomic
All SNPs on chromosome 21
AAA AAB
ABB
BBB
AAAB
BBA
Bnull
In theory
A
Bnull
In practice
So how are copy numbers called?
Look for runs of SNPs that are high or low in intensity
Many available algorithmse.g. HMM, CBS, change-point
Basic picture
Komura et al.
GenomeResearch2006
More complex examples (cancer genetics)
Peiffer et al. Genome Research, 2006
Angle (genotype info)
total intensity
amplification
AA
AB
BB
deletiondeletion
Extra copy of whole chromosome
total intensity high over
whole chromosome
3 genotype groups
No copy number change, but a region of homozygosity (LOH)
LOH
Basic picture
Wang et al. Genome Research, 2007
29
Chromosome 9
A few statistical issues to think about …
(there’s still a lot to do)
Many run-calling algorithms are oriented towards clinical applications.
Many CNV detection algorithms are very conservative - aim for zero false positive rate.
Most use normalization methods that assume a large reference population is not available.
Many use models that make assumptions about what kinds of variation are likely (e.g. cancer).
Family data should be modeled together.
CNV “calls” will be much more accurate if you use the whole family, but the model you use should depend on whether you are expecting de novo mutations or not.
For some diseases you’ll expect associations with de novo changes. For others you might expect inherited variants.
How do we group CNVs for association testing?
deletion
deletion
deletion deletion
duplication
Separate methods for deletions?
Deletions are easier to detect than other changes.
Deletions are likely to have simpler biological effects.
The most important one …
The technology is still NOT intended for reliably and comparably measuring total intensity!
Total intensity numbers are very sensitive to DNA source, sample handling, etc., so extreme measures must be taken to ensure that cases and controls are comparable.
Recommended