45
Causes of regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK [email protected]

Causes of regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK [email protected]

Embed Size (px)

Citation preview

  • Slide 1
  • Causes of regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK [email protected]
  • Slide 2
  • Human Genome: ~25,000 genes 1-1.5% of the human DNA is coding Is the remaining 98.5% junk
  • Slide 3
  • Gene expression as a phenotype Altered patterns of gene expression disease. e.g., Type 1 diabetes, Burkitts lymphomas. Widespread intraspecific variation. Heritable genetic variation for transcript levels. Familial aggregation of expression profiles (Cheung et al. 2003). In humans, ~30% of surveyed loci exhibited a genetic component for expression differences (Monks et al. 2004; Schadt et al. 2003). Much of the influential variation is located cis- to the coding locus. In humans, mouse, and maize, 35%- 50% of the genetic basis for intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.). Stranger and Dermitzakis 2006
  • Slide 4
  • Why study gene expression Describe and dissect regulatory variation Annotate regulatory elements in the human genome Support disease studies to interpret statistical signals Distribution of molecular effects in the genome Natural selection
  • Slide 5
  • Outline Gene expression variation recent studies Analysis of gene expression with HapMap phase II SNPs Update on CNV-expression associations Natural selection and cis regulatory effects
  • Slide 6
  • DNA REG GENE REG GENE i) Pre-mRNA ii) mRNA iii) Protein iv) DNA Nature of regulatory variation Stranger and Dermitzakis, Human Genomics 2005 Expression
  • Slide 7
  • Effects of Copy Number Variation on gene expression
  • Slide 8
  • Gene expression association mapping Stranger et al. PLoS Genet 2005 AA AG GG
  • Slide 9
  • Whole-genome gene expression ~48,000 transcripts 24,000 RefSeq 24,000 other transcripts 270 HapMap individuals: CEU: 30 trios, 90 total CHB: 45 unrelated JPT: 45 unrelated YRI: 30 trios, 90 total 2 IVTs each person 2 replicate hybridizations each IVT Quantile normalization of all replicates of each individual. Median normalization across all individuals of a population. Cell line RNA IVT1IVT2 rep1 rep2rep3 rep4 illumina Human 6 x 2 gene GEX arrays
  • Slide 10
  • HapMap SNPs 60 CEU 45 CHB 44 JPT 60 YRI Phase I HapMap; MAF > 0.05 CEU:762,447 SNPs CHB: 695,601 JPT: 689,295 YRI: 799,242 ~1/5kb 14,072 genes
  • Slide 11
  • Copy Number Variation dataset Genome Structural Variation Consortium Redon et al. Nature Nov 22, 2006 Array-CGH using a whole genome tile path array Median clone size ~170 kb All 270 HapMap individuals Quantitative values (log 2 ratios) representing diploid genome copy number, not genotypes. 1117 CNVs called from log 2 ratios Calls based on standard deviation of log 2 ratios Many CNVs experimentally verified 26,563 clones 93.7% euchromatic genome
  • Slide 12
  • Clone signal (log2 ratio) Linear regression for SNPs CNV and expression
  • Slide 13
  • SNP cis-analysis: SNPs within 1Mb of probe midpoint 1Mb SNPs gene probe 2Mb window
  • Slide 14
  • CNV cis-analysis: clone midpoint within 2Mb of probe midpoint 2Mb clones gene probe 4Mb window
  • Slide 15
  • Permutation g11g12g13g14g1n g21g22g23g24g2n g31g32g33g34g3n gi1gi2gi3gi4gin Exp1 Exp2 Exp3 Expi permute GENOTYPESGENE EXPRESSION - 10,000 permutations each time keep lowest p-value - Null distribution of 10,000 extreme p-values - Compare observed p-values to the tails of the null Doerge and Churchill 1996
  • Slide 16
  • Stranger et al. Science 2007 CNV vs. SNP associations
  • Slide 17
  • Slide 18
  • CNVs and SNPs mostly capture different effects Relative impact on gene expression: 82% SNPs 18% CNVs Only 13% of genes with CNV association also had a SNP association in the same population biased toward large effect size. CNV and SNP variation are highly correlated (p-value 0.001).
  • Slide 19
  • Custom vs. Genome-wide [Stranger et al. 2005 PLoS Genet and Stranger et al. 2007 Science] 2 batches of 60 CEU individuals grown independently at two different labs RNA extraction and labelling by different labs and people Run in custom and gw illumina arrays 97% of associations at the 0.05 permutation threshold from the custom array analysis were also detected in gw analysis
  • Slide 20
  • HapMap phase II analysis ~ 4 million SNP genotypes made publicly available for the 270 HapMap individuals. Density: 1 SNP/ 700 bps Includes ~50% of expected common SNPs in these populations. 2.2 million SNPs analyzed (MAF>0.05)
  • Slide 21
  • phase I HapMap both phase II HapMap CEU286258299 CHB317269318 JPT337297341 YRI356310394 cis- significant genes (0.001) 90% 85% 87% 86% 85% 87% 79% Phase I vs. Phase II
  • Slide 22
  • Slide 23
  • Population sharing of cis- associations
  • Slide 24
  • Associated SNP position relative to TSS
  • Slide 25
  • Distribution of regulatory elements around the TSS ENCODE Nature 2007
  • Slide 26
  • Direction of allelic effect same SNP-gene combination across populations AGREEMENT OPPOSITE Population 1Population 2 log 2 expression
  • Slide 27
  • Direction of allelic effect
  • Slide 28
  • Pooling populations Pop1Pop2 Spurious associations Pop1 Pop2
  • Slide 29
  • Conditional permutations Permute data within each pop separately then perform test X 4
  • Slide 30
  • Multi-population analysis
  • Slide 31
  • Figure 2A Number of populations sharing association in cis: single population analysis Proportion of single pop cis associated genes detected in multi-population analysis
  • Slide 32
  • SGPP2
  • Slide 33
  • Trans- phase II HapMap association Biological hypotheses: functional categories Regulatory SNPs identified from cis- analysis (52%) Non-synonymous SNPs (39%) Splice site SNPs (7%) miRNA SNPs (1%) DNA REG GENE rSNPsnsSNPs spliceSNPs miRNA SNPs Genome-wide associations ~ 25,000 SNPs per population x 14,072 genes GENE
  • Slide 34
  • Trans- associations correction at 0.001 15 genes estimated false positives FDR = 33%-39% correction at 0.01 150 genes estimated false positives FDR = 60%-75% 14,072 genes tested 10 -3 threshold
  • Slide 35
  • regulatory SNPs (cis 0.001)ns SNPssplice SNPsmiRNA SNPs ratiop-valueratiop-valueRatio p- valueratio p- value CEU6.053.23E-240.151.22E-210.490.0701 CHB3.697.90E-100.241.91E-090.760.7101 JPT3.152.06E-070.318.82E-070.710.5501 ! Enrichment of regulatory SNPs and deficit of nsSNPs in trans- associations 3-6x more likely that a cis regulatory effect explains a trans regulatory effect
  • Slide 36
  • Multi-pop CNV analysis Combined 4 populations: 193 genes at 0.001 (48 overlap with the 99 from single population analysis) Combined 3 populations: 173 genes at 0.001 (42 overlap with the 99 from single population analysis)
  • Slide 37
  • CNV trans effects Biological pathway Variable expression
  • Slide 38
  • Trans-position
  • Slide 39
  • Trans effects - CEU
  • Slide 40
  • Trans effects - YRI
  • Slide 41
  • Gene expression and natural selection TSS -logpval With Sridhar Kudaravalli and Jonathan Pritchard (unpublished)
  • Slide 42
  • Gene expression and natural selection With Sridhar Kudaravalli and Jonathan Pritchard (unpublished)
  • Slide 43
  • Co-segregating regulatory variants can drive differential isoform expression
  • Slide 44
  • SUMMARY Cis- and trans- acting genetic variation influencing mRNA levels. CNV effects detected are largely not captured by SNPs Structural variation (copy number polymorphism) influences transcript level variation. Many detected associations are shared across human populations replication of effects Signal concentrated within 100 Kb from the promoter symmetrically Trans-acting effects of CNVs - interpretation Primary effects of trans associations are largely cis regulatory effects Cis regulatory effects under positive selection
  • Slide 45
  • Cambridge University Mark Dunning Natalie Thorne Simon Tavar illumina Jill Orwick Mark Gibbs Acknowledgements Barbara Stranger Alexandra Nica Antigone Dimas Christine Bird Matthew Forrest Catherine Ingle Claude Beazley Panos Deloukas Matt Hurles Genome Structural Variation Consortium Richard Redon, Nigel Carter, Charles Lee, Chris Tyler-Smith, Stephen Scherer, The HapMap Consortium Wellcome Trust for funding Stanford Daphne Koller