51
eQTLs and sQTLs…. Robert Gentleman Sahar Mozaffari Rob Tunney other members of Computational Biology at 23andMe

eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

eQTLsandsQTLs….

RobertGentlemanSaharMozaffariRobTunney

othermembersofComputationalBiologyat23andMe

Page 2: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Outline

•  whyisthisinteresting…•  whatisaneQTLorsQTL•  howdowefindthem•  whatarethetechnicalissues•  whataretheinferentialissues

Page 3: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

AclassicDEexperiment•  takesomenumberofsampleswithadiseaseandsomenumberwithoutout(nD,nCtrl)

•  obtainRNA-seq(orsimilar,couldbeproteinlevelsormethylationor….)

•  thenperformanDifferentialExpressionanalysis•  finddifferentiallyexpressedgenesandthentrytounderstandwhicharecausalandwhichareconsequences

•  inpracticemost(sometimesalmostall)DEgenesarenotcausalandhencewillnotbegoodtargetsfortherapeuticintervention

•  alsonotlikelybiomarkersinthesensethattheyarenotnecessarilypredictiveoflongtermconsequences

Page 4: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

DEexperiment

•  samplesizestendtobesmall•  itisnotclearhowrepresentativeofallpeoplewiththediseasethesampleis

•  manyoftheDEgenesareconsequences– egaTFisexpressedinatissueitshouldbesilent– egincancerlargegenomicrearrangementsyieldmanycorrelatedchanges,somesmallnumbermaybecausal–theremainderarepassengers

Page 5: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Adifferentapproach

•  inmanycasessomeonehasperformedagenomewideassociationstudyforthedisease

•  theideabehindaGWASistoidentifygeneticvariantsthatassociatewiththedisease

•  theGWASvariantismorelikelytobecausalsincegenomecomesfirstandphenotypetemporallysecond

•  samplesizestendtobelarge(egUKBiobankhasabout500KpeopleandprovidesusefulGWASformanydiseases)

•  theproblemwithaGWASisthatthevariantweidentifymaynotbethecausalvariant

Page 6: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

GWAS•  so,whatwewanttodoistotrytofindsomefunctionalvariantthatisinhighlinkagedisequilibriumwiththetagvariant

•  simpleexamples:thetagvariantisacodingvariantinsomegene–  CFTRforcysticfibrosis–  BRCA1/BRCA2forbreastcancer

•  somediseasecausingvariantsmediatetheiraffectthroughchangesinproteinabundance– mRNAisaeasytomeasuresurrogateforproteinabundance

Page 7: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Thetwoapproachesarecomplementarynotcompetitive

•  bothdifferentialexpressionexperimentsandGWAScanhelpidentifylikelycausalgenes

•  eachhastheirstrengthsandweaknessesandthebetterweareabletocombinethemtheeasieritwillbetoidentifynovelregulatorygenesthatmaybeusefulasdrugtargetsorasbiomarkersfortherapy…

•  thereisnogoodreasonyoucouldnotextendeithertousefeatures/approachesfromtheother

Page 8: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

•  approximately3.5billionnucleotidesinasinglecopyofthehumangenome

•  wecansequencethegenomeandattempttomeasureitateverylocation–  thecostofWholeGenomeSequencing(WGS)isabout$1500/perperson(dependsondepthandscale)

•  wecangenotypeindividualsandthenimpute–  thecostofusingagenotypingarrayislessthan$50/perperson(~700Kvariants);totalcostabout$100/pp

TheHumanGenome

Page 9: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

•  the human genome is encoded on 22 autosomes (each of us has a pair) and the sex chromosomes (X and Y) – making 23 pairs

•  it is about 3.5B nucleotides long (ACTG) •  variation in the sequence of the genome is

associated with human disease •  but finding the actual cause can be challenging •  this is referred to as the fine mapping problem

Genetics Primer

Page 10: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

CrossingOver

http://www.macmillanhighered.com/BrainHoney/Resource/6716/digital_first_content/trunk/test/morris2e/asset/img_ch11/morris2e_ch11_fig_11_09.html

•  primarilyoccursduringmeiosis(creationofgametes)•  twoorthreeeventsperchromosomepermeiosis

Page 11: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

•  Linkagedisequilibrium(astrongassociationbetweennearbyvariants)causesconfoundingandmakesithardtoidentifythelikelycausalvariant

•  wedon’treallyhaveaperfectreference•  thereislotsofvariationthatisnotyetaccountedforinthereferencesequence

•  thereferenceshouldbepopulationspecific•  weknowlittleaboutlarger(structural)variants

•  phasing:wehave2copiesofeachchromosome,phasingisusedtoassignavarianttoaspecificoneofthepair

Afewcomplications

Page 12: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

•  to genotype at scale a good strategy is to use arrays for most people and impute

•  basically with imputation we take advantage of the linkage disequilibrium •  individuals that are identical at a subset of genetic

variants will likely be identical in between those variants

Genotyping at scale

Page 13: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Imputation•  OurSNParrayshaveonly~700kmarkersonthem(sparsecompared

tothesizeofthehumangenome)

FiguresfromLietal.,AnnuRevGenomicsHumGenet2009.

Page 14: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Imputation•  OurSNParrayshaveonly~700kmarkersonthem(sparsecompared

tothesizeofthehumangenome)

FiguresfromLietal.,AnnuRevGenomicsHumGenet2009.

Page 15: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Imputation•  OurSNParrayshaveonly~700kmarkersonthem(sparsecompared

tothesizeofthehumangenome)

FiguresfromLietal.,AnnuRevGenomicsHumGenet2009.

Page 16: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Imputation•  OurSNParrayshaveonly~700kmarkersonthem(sparsecompared

tothesizeofthehumangenome)

FiguresfromLietal.,AnnuRevGenomicsHumGenet2009.

Page 17: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

•  thequalityoftheimputationdependsonthesize(numberofindviduals)inthereferencepanel

•  andonhowwellthereferencepanelmatchesthepopulationthatwasgenotyped

•  wecurrentlyimputeupto25Mvariantsaccurately

•  wearedevelopingreferencepanelsfordifferentethnicities

Imputation

Page 18: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

GWAS

•  Genome-wideassociationstudy•  basicallyalogisticregressionateverylocustoassociateaphenotype(presence/absence)withgeneticvariation

•  wethenexaminethoseforwhichthep-valueislessthan5e-8(orthere-abouts)–whicharecalledhits

•  thehitindicatesanassociationbetweenvariationatthelocusandriskofthephenotype/disease

Page 19: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Ourgenotypeaffectsourcharacteristics

•  whatfoodwelike•  howtallweare•  howfatorthin•  whatdiseaseswearesusceptibleto•  behaviors–risktaking,depressionetc.

Page 20: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

FromtheFun

variantsthatassociatewithapreferenceforStrawberryicecreamovervanillatheyareinolfactoryreceptors23andMeBlog

Page 21: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

TotheSerious

Page 22: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Somecomplexities

•  therecanbeverycomplexgeneticinteractionsthatleadtodisease

•  therecanbegenebyenvironmentinteractionsthataffectrisk– egriskofsmokingandriskoflungdiseases(cancer,COPDetc)

•  SpitzMR,AmosCI,DongQ,LinJ,WuX.TheCHRNA5-A3regiononchromosome15q24-25.1isariskfactorbothfornicotinedependenceandforlungcancer.JNatlCancerInst2008;100:1552-6.doi:10.1093/jnci/djn363pmid:18957677

Page 23: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which
Page 24: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

eQTL

•  aneqtlisanexpressionQuantitativeTraitLocus

•  weessentiallyperformaGWASusingtheexpressionvalueofthegeneasatrait– butthereare20Khumangenes,sothiswouldleadtoanamazingamountofcomputationandmultipletestingcorrection

– somostpeopledosomeformofcis-eQTLanalysisusingonlySNPswithinsomekMBoftheTSSorTSE.Oftenk=0.5MB

Page 25: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

eQTL

•  expressionQuantitativeTraitLocus– alocationinthegenomewherethereispolymorphicexpression(thenucleotideatthatpositionvariesinthepopulation)

– agene,whoseexpressionappearstobeassociatedwiththatvariationatsomelevel

Page 26: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

It is all about power….

Page 27: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

eQTLsinpractice

•  theactualmodelingiscomplexasthereisoftenaneedtocorrectfor:– unknownexpressionbatcheffects(PEER)– unknownpopulationstructure(geneticPCs)– otherknown,orpossibleconfounders(oftenageandsex)

•  todatethereislittleconcernwithperformingconditionalanalysisandusuallyjustthetophitinsomelocusisobtained–  thistendstofavorcommonalleles..duetopower

Page 28: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Datarequirement

•  weneedsomenumberofindividualswhohavebeenbothgenotypedandhadRNA-seq(orsimilar)carriedoutonthem

•  fromtheRNA-seqwecancomputeexpressionlevels(insomeunits)andassesslocalstructure(egaresomeexonsskipped)

Page 29: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

GEUVADIS

Lappalainen et al. 2013 Nature http://dx.doi.org/10.1038/nature12531

Page 30: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

●  462 individuals with expression

●  445 pass 1000 Genomes Phase 3 QC ○  358 EUR ○  87 YRI

●  Across 7 labs

GEUVADIS Population Sample

size used

CEU 89

GBR 92

FIN 86

TSI 91

YRI 87

Page 31: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Distance of conditional eQTL from TSS & primary eQTL

Page 32: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

LD R2 of conditional eQTL with primary eQTL

Page 33: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Conditional eQTL MAF & estimated effect size

Page 34: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

sQTL analysis can compute effects on isoform abundances...

NCBI: http://www.genome.gov/Images/EdKit/bio2j_large.gif

X% Y% Z%

Page 35: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

...or on the splicing frequency of individual exons

NCBI: http://www.genome.gov/Images/EdKit/bio2j_large.gif

X% Y% Z%

Included: X+Z%

Excluded: Y%

Page 36: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

1.  Less sensitive to 3′ recovery bias 2.  Less computationally complex

a.  2 vs. 2n outcomes 3.  Interpretable for TX

a.  How often is a domain spliced in?

Exon Level sQTL analysis SLC221A Read Counts

vs. Exon Position

Page 37: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

I2

I1

E

Cassette exons are simple alternative splicing events

Page 38: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Inclusion counts = 0.5 * (I1 + I2)

I2

I1

E

Exclusion counts

Use junction reads to assess splicing ratios

Page 39: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Inclusion counts = 0.5 * (I1 + I2) Percent spliced in (ѱ) = Incl. / (Incl. + Excl.)

I2

I1

E

Exclusion counts

Use junction reads to assess splicing ratios

Page 40: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

1.  Each event has 2 counts: inclusion, exclusion 2.  How to associate genotype with these counts?

a.  Common: compute PSI, pass to FastQTL b.  GLM on one count, using total counts as offset term

3.  Inclusion/exclusion controls for gene expression

Splicing Data and Modeling

I2

I1

E

Page 41: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

1.  Data a.  114 GTEx Liver samples

2.  Splicing quantification a.  Spliced RNA-Seq alignment with STAR b.  Annotated cassette exons from VastDB c.  Compute inclusion/exclusion from junction reads

3.  sQTL association a.  Test each exon for association with cis-SNPs b.  SNPs in window 20 kb 5′ and 3′ of exon

sQTL Analysis - Overview

Page 42: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

1.  Negative binomial regression (glm.nb in R) 2.  For each cassette exon, for sample i:

a.  xi = inclusion counts b.  Ni = inclusion counts + exclusion counts

3.  Covariates: age, sex, WGS platform, surgical/postmortem, 5 genetic PCs

sQTL Association Testing - Model

Page 43: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

1.  Data requirements to test exons a.  ≥10 junction reads in ≥40 samples b.  ≥2% minor allele in sample c.  ≥10% samples with alternative splicing

2.  Test exons for overdispersion w.r.t poisson regression model a.  Overdispersed -> NB regression b.  Not overdispersed -> Poisson regression

sQTL Association Testing - Model

Page 44: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

sQTLResults

Page 45: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

1.  17355 candidate exons 2.  8723 exons pass data filters 3.  611 exons with ≥1 sQTL hit

GTEx Liver sQTL results

Page 46: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Top Hits

Page 47: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Top Hits

Page 48: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Splice Site sQTLs - CAST

Coulombe-Huntington et al. 2009, PLOS Genetics

Page 49: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Top Hit, chr5:96076449-96076487

Page 50: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Splice Site sQTLs - METTL2B

Page 51: eQTLs and sQTLs…. · associate a phenotype (presence/absence) with genetic variation • we then examine those for which the p-value is less than 5 e-8 (or there-abouts) – which

Splice Site sQTLs - APIP