Upload
isha
View
36
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Regulatory variation and its functional consequences. Chris Cotsapas [email protected]. Motivating questions. How do phenotypes vary across individuals? Regulatory changes drive cellular and organismal traits Likely also drive evolutionary differences - PowerPoint PPT Presentation
Citation preview
Regulatory variation and its functional consequences
Chris [email protected]
Motivating questions
• How do phenotypes vary across individuals?– Regulatory changes drive cellular and organismal
traits– Likely also drive evolutionary differences
• How are genes (co)regulated?– Pathways, processes, contexts
Regulatory variation
• What do “interesting” variants do?• Genetic changes to:
– Coding sequence **– Gene expression levels– Splice isomer levels– Methylation patterns– Chromatin accessibility– Transcription factor binding kinetics– Cell signaling– Protein-protein interactions
~88% of GWAS hits are regulatory
Genetic variation alters regulation
• Protein levels – Maize (Damerval 94)
• Expression levels– Yeast, maize, mouse, humans (Brem 02, Schadt 03,
Stranger 05, Stranger 07)• RNA splicing
– Humans (Pickrell 12, Lappalainen 13)• Methylation and Dnase I peak strength
– Humans (Degner 12; Gibbs 12)
• cis-eQTL– The position of the eQTL maps near
the physical position of the gene.– Promoter polymorphism?– Insertion/Deletion?– Methylation, chromatin conformation?
• trans-eQTL– The position of the eQTL does not
map near the physical position of the gene.
– Regulator?– Direct or indirect?
Modified from Cheung and Spielman 2009 Nat Gen
Genetics of gene expression (eQTL)
Cis- eQTL analysis: Test SNPs within a pre-defined distance of gene
1Mb 1Mb
SNPsgene
probe
1Mb window
QT association• Analysis of the relationship between a dependent or outcome
variable (phenotype) with one or more independent or predictor variables (SNP genotype)
Yi = b0 + b1Xi + ei
Number of A1 Alleles0 1 2
Conti
nuou
s Tra
it Va
lue
b0
Slope: b1
Linear Regression Equation
Logistic Regression Equation
= b0 + b1Xi + eiln( )pi
(1-pi)
gene 3
eQTL analysis: a GWAS for every gene
gene 2
gene N
gene 5
gene 4
gene 1
cis-eQTLs are rather common
Nica et al PLoS Genet 2011
Cis-eQTLs cluster around TSS
Stranger et alPLoS Genet 2012
trans hotspots (yeast)
Brem et al Science 2002
Yvert et al Nat Genet 2003
DOES REGULATORY VARIATION ALTER PHENOTYPE? APPLICATION TO GWAS
Candidate genes, perturbations underlying organismal phenotypes
Rationale
• How do disease/trait variants actually alter biology?
• If they change regulation, then:– Change in gene expression/isoform use– Phenotypic consequence*
Compare patterns of association
GWAS peak
eQTL for gene 1
eQTL for gene 2
Pearson’s covariance for windows of 51 SNPs between –log(p) in 2 traits
CD GWAS p
eQTL p
Detect a peak when effect is the sameNo peak when there are independent hits near each other
Crohn’s/eQTL analysis
• CD meta analysis (GWAS only)• CEU Hapmap LCL eQTL data• Overlapping SNPs only (eQTL data has 610K
SNPs, most in CD meta-analysis)• Test 133 associations (total 1054 tests)
GWAS peak
eQTL for gene 1
eQTL for gene 2
Crohn’s/eQTL analysisSNP CHR Gene
rs11742570 5 PTGER4
rs12994997 2 ATG16L1
rs11401 16 SPNS1
rs10781499 9 INPP5E
rs2266959 2 C22orf29
A peak implies that the same effect drives GWAS and eQTL
MS/eQTL analysisSNP CHR Gene
rs6880778 5 PTGER4
rs7132277 12 CDK2AP
rs7665090 4 CISD2
rs2255214 3 GOLGB1 & EAF2
rs201202118 12 METTL1 & TSFM
rs12946510 17 ORMDL3, STARD3 & ZPBP2
rs2283792 22 PPM1F
rs7552544 1 SLC30A7
rs34536443 19 SLC44A2
A peak implies that the same effect drives GWAS and eQTL
DOES REGVAR REVEAL CO-REGULATION? A.K.A. WHERE ARE THE TRANS eQTLS?
Open question
gene 3
Whole-genome eQTL analysis is an independent GWAS for expression of each gene
gene 2
gene N
gene 5
gene 4
gene 1
Issues with trans mapping
• Power– Genome-wide significance is 5e-8
– Multiple testing on ~20K genes– Sample sizes clearly inadequate
• Data structure– Bias corrections deflate variance– Non-normal distributions
• Sample sizes– Far too small
But…
• Assume that trans eQTLs affect many genes…
• …and you can use cross-trait methods!
Association data
Z1,1 Z1,2 … … Z1,p
Z2,1
::
Zs,1 Zs,p
Cross-phenotype meta-analysis
SCPMA ~L(data | λ≠1)
L(data | λ=1)
Cotsapas et al, PLoS Genetics
CPMA for correlated traits
• Empirical assessment to account for correlation
• Simulate Z scores under covariance, recalculate CPMA
• Construct distribution of CPMA for dataset, call significance
with Ben Voight, U Penn
Experimental design
610,180 SNPs MAF >0.15 CEU and YRI
LD pruned (r2 < 0.2)
8368 transcriptsDetectable on Illumina arrays
108 CEU individuals*109 YRI individuals*
* Stranger et al Nat Genet 2007(LCL data; publicly available)
CEU p-values Transcript ~ SNP, sex
YRI p-values Transcript ~ SNP, sex
plink CPMA
CEU CPMA scores
YRI CPMA scores
>95%ile sim CPMA
Target sets of genes
• trans-acting variant: SNP with CPMA evidence• Target genes: genes affected by trans-acting
variant (i.e. regulon)
Prediction 1
• Allelic effects should be conserved between two populations– Binomial test on paired observations for all genes
P < 0.05 in at least one population
True for 1124/1311 SNPs (binomial p < 0.05)
Genes pCEU < 0.05
Genes pYRI < 0.05
CEU + + - - +
YRI + + - - +
YRI - - + + -
Prediction 2
• Target genes should overlap– Identify by mixture of gaussians classification– Empirical p from distribution of overlaps between
NCEU and NYRI genes across SNPs.
True for 600/1311 SNPs (empirical p < 0.05)
Genes pCEU < 0.05
Genes pYRI < 0.05
What about the target genes?
• Regulons:– Encode proteins more
connected than expected by chance
www.broadinstitute.org/mpg/dapple.phpRossin et al 2011 PLoS Genetics
What about the target genes?
• Regulons:– Encode proteins enriched for
TF targets (ENCODE LCL data)– 24/67 filtered TFs significant– Binomial overlap test
TF p-value
CEBPB 3.7 x 10-142
HDAC8 7.8 x 10-122
FOS 2.5 x 10-96
JUND 3.7 x 10-88
NFYB 3.3 x 10-71
ETS1 3.8 x 10-63
FAM48A 2.1 x 10-61
FOXA1 1.4 x 10-33
GATA1 4.6 x 10-33
HEY1 7.8 x 10-32
transtarget genes
CHiPseqLCL targetgenes
Summary
• Regulatory variation is common• It affects gene expression levels• Likely many other types:
– DNA accessibility, chromatin states– Transcript splicing, processing, turnover
• Has phenotypic consequences– GWAS– Some cellular assays (not discussed here)
Open questions
• Discover regulatory elements (cis)– Promoters, enhancers etc
• Gene regulatory circuits (trans)• Dynamics of regulation
– Splicing variation, processing, degradation• Phenotypic consequences
– Cellular assays required• Tie in to organismal phenotype
NEXT-GEN SEQUENCING DATARNAseq, GTEx
GTEx – Genotype-Tissue EXpressionAn NIH common fund project
Current: 35 tissues from 50 donors
Scale up: 20K tissues from 900 donors.
Novel methods groups: 5 current + RFA
How can we make RNAseq useful?
• Standard eQTLs – Montgomery et al, Pickrell et al Nature 2010
• Isoform eQTLs– Depth of sequence!
• Long genes are preferentially sequenced• Abundant genes/isoforms ditto• Power!?• Mapping biases due to SNPs
RNAseq combined with other techs
• Regulons: TF gene sets via CHiP/seq– Look for trans effects
• Open chromatin states (Dnase I; methylation)– Find active genes– Changes in epigenetic marks correlated to RNA– Genetic effects
• RNA/DNA comparisons – Simultaneous SNP detection/genotyping– RNA editing ???