Transcript
Page 1: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

Single Nucleotide PolymorphismCopy Number Variations

and SNP Array

Xiaole Shirley Liu and

Jun Liu

Page 2: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

2

Outline• Definition and motivation• SNP distribution and characteristics

– Allele frequency, LD, population stratification• SNP discovery (unknown) and genotyping

(known)– CNV detection

Page 3: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

3

Polymorphism• Polymorphism: sites/genes with “common”

variation, less common allele frequency ≥1%, otherwise called rare variant and not polymorphic

• First discovered (early 1980): restriction fragment length polymorphism

• Some definitions: – Locus: position on chromosome where sequence

or gene is located– Allele: alternative form of DNA on a locus

Page 4: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

4

Polymorphism• Single Nucleotide Polymorphism

– Occasionally short (1-3 bp) indels are considered SNPs too

– Come from DNA-replication mistake individual germ line cell, then transmitted

– ~90% of human genetic variation• Copy number variations

– May or may not be genetic

Page 5: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

5

Why Should We Care• Disease gene discovery

– Association studies, certain SNPs are susceptible for diabetes

– Chromosome aberrations, duplication / deletion might cause cancer

• Personalized Medicine– Drug only effective if you have one allele

Page 6: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

6

Page 7: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

7

Page 8: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

8

SNP Distribution• Most common, 1 SNP / 100-300 bp

– Balance between mutation introduction rate and polymorphism lost rate

– Most mutations lost within a few generations• 2/3 are CT differences• In non-coding regions, often less SNPs at

more conserved regions• In coding regions, often more synonymous

than non-synonymous SNPs

Page 9: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

9

SNP Characteristics: Allele Frequency Distribution

• Most alleles are rare (minor allele frequency < 10%)

Page 10: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

10

Mode of inheritance

Page 11: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

12

SNP Characteristics:Hardy-Weinberg equilibrium (HWE)– In a population with genotypes BB, bb, and Bb, if p =

freq(B), q =freq(b), the frequencies of BB, bb and Bb will be p2, q2, and 2 pq respectively at equilibrium, and will not change.

– Assumptions for HWE: no mutation, no migration or emigration, infinite population size, no selective pressure, random mating. Could derivate from HWE if violated

– It provides a baseline against which to measure change, e.g., inbreeding index:

– More than 2 alleles:

Page 12: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

13

SNP Characteristics:Linkage Disequilibrium

• Equilibrium Disequilibrium

• LD: If Alleles occur together more often than can be accounted for by chance, then indicate two alleles are physically close on the DNA– In mammals, LD is often lost at ~100 KB– In fly, LD often decays within a few hundred

bases

Page 13: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

14

SNP Characteristics:Linkage Disequilibrium

• Statistical Significance of LD– Chi-square test with 1 df– eij = ni. n.j / nT

ji ij

ijij

een

,

22 )(

B1 B2 TotalA1 n11 n12 n1.A2 n21 n22 n2.Total n.1 n.2 nT

Page 14: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

15

SNP Characteristics:Linkage Disequilibrium

• Three ways to calculate LD

11 1 1

1 2 2 1max max

1 1 2 2

22

1 2 1 2

max( , ) 0' / , where

max( , ) 0

D p p q

p q p q if DD D D D

p q p q if D

Drp p q q

ObservedExpected

Page 15: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

16

SNP Characteristics:Linkage Disequilibrium

• Haplotype block: a cluster of linked SNPs• Haplotype boundary: blocks of sequence

with strong LD within blocks and no LD between blocks, reflect recombination hotspots

• Haplotype size distribution

Page 16: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

17

SNP Characteristics:Linkage Disequilibrium

• Can see haplotype block: a cluster of linked SNPs

Page 17: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

18

SNP Characteristics:Linkage Disequilibrium

• [C/T] [A/G] T X C [A/C] [T/A]– Possible haplotype: 24

– In reality, a few common haplotypes explain 90% variations

• Tagging SNPs: – SNPs that capture

most variations in haplotypes

– removes redundancy

Redundant

Page 18: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

19

SNP Characteristics:Population Stratification

• Population stratification: individuals selected from two genetically different populations, stratification may be environmental, cultural, or genetic

• Could give spurious results in case control association studies – the example of “chopstick genes”

Page 19: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

20

Using genetic variation to study populations

Page 20: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

21

SNP Discovery Methods• Sequencing individuals for difference: too costly • First check whether big regions have SNPs

– Basic idea: denature and re-anneal two samples, detect heterduplex

– Can pool samples (e.g. 10 African with 10 Caucasians) to speed screening

• Resequence to verify• dbSNP: 12M RefSNP, 6M validated

Page 21: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

22

SNP Genotyping• For a known locus TT C/A AG, does this individual

have CC, AA or AC? Many methods• Hybridization-based methods

– Dynamic allele-specific hybridization– Molecular beacons– SNP-array chip (simultaneously genotype thousands of SNPs)

• Enzyme-based methods– RFLP– PCR-based methods– Flap endonuclease– Primer extension– Oligonucleotide ligase assay

• Other methods (based on physical properties of DNA)

Page 22: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

23

SNP Array• One SNP at a time or genome-wide (SNP array)

2.5kb5.8kb0.30

Page 23: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

24

40 Probes Used Per SNP• Allele call

– AA, BB, AB• Signal

– Theoretically 1A+1B, 2A, 2B– But couldhave 1A+3BAmplified!

Page 24: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

25

T

SNP Chip for LOH• Loss of Heterozygosity: tumor suppressor

gene inactivation by allelic loss in cancers

T T

Normal First genetic hit Cancer

XOR

T T X TX TXA B A A AA B

LOH

Page 25: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

27

SNP Array for CNV• Collect normal / diseased samples on SNP arrays• Probe normalization, background subtraction

• Use HMM to infer CNV

Page 26: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

28

Integrate CNV with Expression toIdentify oncogene MITF in melanoma

Page 27: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

29

Summary• SNP and CNV• SNP distribution and characteristics

– Allele frequency (minor allele > 1%)– LD: linkage ~ physical proximity– Population stratification

• SNP discovery: heteroduplex• SNP genotyping

– SNP array– CNV detection: HMM

Page 28: Single Nucleotide Polymorphism Copy Number Variations and SNP Array

30

Acknowledgement• Stefano Monti• Tim Niu• Kenneth Kidd, Judith Kidd and Glenys

Thomson• Joel Hirschhorn• Greg Gibson & Spencer Muse


Recommended