97
2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 95

2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

2019 SISG Module 8: Bayesian Statistics forGenetics

Lecture 7: Generalized Linear Modeling

Jon Wakefield

Departments of Statistics and BiostatisticsUniversity of Washington

1 / 95

Page 2: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Outline

Introduction and Motivating Examples

Generalized Linear ModelsBayes Linear ModelBayes Logistic Regression

Generalized Linear Mixed ModelsTemporal SmoothingASE Example

Approximate Bayes InferenceThe Approximation

Appendix: Hierarchical Modeling of Allele-Specific Expression DataMotivationModeling

2 / 95

Page 3: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Introduction

3 / 95

Page 4: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Introduction

In this lecture we will discuss Bayesian modeling in the context ofGeneralized Linear Models (GLMs).

This discussion will include the addition of random effects, i.e. we’llconsider the class of Generalized Linear Mixed Models (GLMMs).

Estimation via the quick INLA technique will be demonstrated, alongwith its R implementation.

An approximation technique that is useful (in particular) in the contextof Genome Wide Association Studies (GWAS) (in which the numberof rows of data to analyze is large) will also be introduced.

The accompanying R code allows the analyses presented here to bereplicated.

A complex mixture model for ASE is included in the Appendix, toillustrate some of the flexibility of Bayes modeling.

4 / 95

Page 5: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Motivating Example: Logistic Regression

We consider case-control data for the disease Leber Hereditary OpticNeuropathy (LHON) disease with genotype data for markerrs6767450:

CC CT TT Totalx = 0 x = 1 x = 2

Cases 6 8 75 89Controls 10 66 163 239Total 16 74 238 328

Let x = 0,1,2 represent the number of T alleles, and p(x) theprobability of being a case, given x copies of the T allele.

5 / 95

Page 6: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Motivating Example: Logistic Regression

For such case-control data one may fit the multiplicative odds model:

p(x)

1− p(x)= exp(α)× exp(θx),

with a binomial likelihood.

Interpretation:I exp(α) is of little interest given the case-control sampling.I exp(θ) is the odds ratio describing the multiplicative change in

risk for one T allele versus zero T alleles.I exp(2θ) is the odds ratio describing the multiplicative change in

risk for two T alleles versus zero T alleles.I Odds ratios approximate the relative risk for a rare disease.

A Bayesian analysis adds a prior on α and θ.

6 / 95

Page 7: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Motivating Example: FTO Data

RecallI Y = weightI xg = fto heterozygote ∈ {0,1}I xa = age in weeks ∈ {1,2,3,4,5}

We will examine the fit of the model

E[Y |xg, xa] = β0 + βgxg + βaxa + βintxgxa,

with independent normal errors, and compare with a Bayesiananalysis.

7 / 95

Page 8: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Motivating Example: RNA Seq with Replicates

I We report an experiment carried out in a previous collaboration,see Connelly et al. (2014) for further details.

I Start with two haploid yeast strains (individuals).I From these we obtain RNA-Seq data, where we isolate RNA

from the two individuals, fragment and sequence it usingnext-generation sequencing, and map the sequencing readsback to the genome to generate RNA levels in the form of countsof the number of sequencing reads mapping at each gene.

I Also mate the two haploid yeast strains together to form a diploidhybrid. We again isolate RNA, fragment, and sequence it.

I Then take advantage of polymorphisms between the two strainsin order to map reads to either of the two haploid individuals,giving us counts for the number of reads mapping to either one ofthe parental genomes in the diploid hybrid for each gene.

8 / 95

Page 9: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Motivating Example: RNA Seq with Replicates

I We are interested in two questions from this data. First, we wantto look for evidence of trans effects at each gene; in biologicalterms, this means that polymorphisms located far from the geneare responsible for differences in RNA levels.

I To detect this, look for genes where the difference between RNAlevels in the haploids differs from the difference between RNAlevels for the two parental strains in the diploid.

I The question we concentrate on looking for cis effects, these arepolymorphisms near the gene itself that are responsible fordifferences in RNA levels.

I We can detect cis effects as a difference in the count of readsmapping to each of the parental strains in the diploid at a gene,is the probability of arising from the two parents, 0.5?

9 / 95

Page 10: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Motivating Example: RNA Seq Data, Statistical Model

There are two replicates and so for each of N genes we obtain twosets of counts.

For the diploid hybrid let :I Yij be the number of A alleles for gene i and replicate j , andI Nij is the total number of counts, so that Nij − Yij is the number of

T alleles j = 1,2.

10 / 95

Page 11: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Motivating Example: RNA Seq Data, Statistical Model

We fit a random effects logistic regression model starting with firststage:

Yij |Nij ,pij ∼ Binomial(Nij ,pij )

so that pij is the probability of seeing an A read for gene i andreplicate j .

At the second stage:

pij

1− pij= exp(θi + εij ),

whereεij ∼ N(0, σ2),

represent random effects that allow for excess-binomial variation.

In the model θi is a parameter of interest – if a (say) 95% posteriorinterval estimate contains 0 then we have evidence of cis effects.

11 / 95

Page 12: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

GLMs

12 / 95

Page 13: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Generalized Linear Models

I Generalized Linear Models (GLMs) provide a very usefulextension to the linear model class.

I GLMs have three elements:1. The responses follow an exponential family.2. The mean model is linear in the covariates on some scale.3. A link function relates the mean of the data to the covariates.

I In a GLM the response yi are independently distributed andfollow an exponential family1, i = 1, . . . ,n.

I Examples: Normal, Poisson, binomial.

1so that the distribution is of the form p(yi |θi , α) = exp({yiθi − b(θi )}/α+ c(yi , α)),where θi and α are scalars

13 / 95

Page 14: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Generalized Linear Models

I The link function g(·) provides the connection between the meanµ = E[Y ] and the linear predictor xβ, via

g(µ) = xβ,

where x is a vector of explanatory variables and β is a vector ofregression parameters.

I For normal data, the usual link is the identity

g(µ) = µ = xβ.

I For binary data, a common link is the logistic

g(µ) = log

1− µ

)= xβ.

I For Poisson data, a common link is the log

g(µ) = log (µ) = xβ.

14 / 95

Page 15: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Bayesian Modeling with GLMs

I For a generic GLM, with regression parameters β and a scaleparameter α, the posterior is

p(β, α|y) ∝ p(y |β, α)× p(β, α).

I An immediate question is: How to specify a prior distributionp(β, α)?

I How to perform the computations required to summarize theposterior distribution (including the calculation of Bayes factors)?

15 / 95

Page 16: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Bayesian Computation

Various approaches to computation are available:I Conjugate analysis — the prior combines with likelihood in such

a way as to provide analytic tractability (at least for someparameters).

I Analytical Approximations — asymptotic arguments used(e.g. Laplace).

I Numerical integration.I Direct (Monte Carlo) sampling from the posterior, as we have

already seen.I Markov chain Monte Carlo — very complex models can be

implemented, for example with WinBUGS, JAGS or Stan.I Integrated nested Laplace approximation (INLA). Cleverly

combines analytical approximations and numerical integration:we illustrate the use of this method in some detail.

16 / 95

Page 17: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Integrated Nested Laplace Approximation (INLA)

I The homepage of the INLA software is here:http://www.r-inla.org/home

I There are also lots of example links at this website.I The fitting of many common models is described here:

http://www.r-inla.org/models/likelihoodsI INLA can fit GLMs, GLMMs and many other useful model

classes.

17 / 95

Page 18: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

INLA for the Linear Model

I The model is

Y = E[Y |xg, xa] = β0 + βgxg + βaxa + βintxgxa + ε

where ε|σ2 ∼iid N(0, σ2).I This model has five parameters: the four fixed effects areβ0, βg, βa, βint and the error variance is σ2 (note that in inla

inference is reported for the precision σ−2).I In general, posterior distributions can be summarized graphically

or via numerical summaries.I In Figures 1 and 2 give posterior marginal distributions for the

fixed effects and hyperparameter σ−2, respectively, under ananalysis with relatively flat priors.

18 / 95

Page 19: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Comparison of OLS and Bayess

# OLSo ls . f i t <− lm ( l i n y ˜ l i n x g + l i n x a + l i n x i n t , data= f t o d f )# MLEs and SEscbind ( coef ( o l s . f i t ) , s q r t ( d iag ( vcov ( o l s . f i t ) ) ) )

[ , 1 ] [ , 2 ]( I n t e r c e p t ) −0.06821632 1.4222970l i n x g 2.94485495 2.0114316l i n x a 2.84420729 0.4288387l i n x i n t 1.72947648 0.6064695# INLAformula <− l i n y ˜ l i n x g + l i n x a + l i n x i n tl i n .mod <− i n l a ( formula , data= f t o d f , f a m i l y =” gaussian ” )# P o s t e r i o r means and SDsl i n . mod$summary . f i x e d [ c ( 1 , 2 ) ]

mean sd( I n t e r c e p t ) −0.06162681 1.4255270l i n x g 2.93325529 2.0135662l i n x a 2.84237281 0.4298868l i n x i n t 1.73261901 0.6073410

Virtually identical!

19 / 95

Page 20: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

−5 0 5

0.00

0.10

0.20

0.30

PostDens [(Intercept)]

Mean = −0.061 SD = 1.371

−5 0 5 10

0.00

0.10

0.20

PostDens [linxg]

Mean = 2.932 SD = 1.937

1 2 3 4 5

0.0

0.4

0.8

PostDens [linxa]

Mean = 2.842 SD = 0.413

−1 0 1 2 3 4 5

0.0

0.2

0.4

0.6

PostDens [linxint]

Mean = 1.733 SD = 0.584

Figure 1: Marginal distributions of the intercept and regression coefficients.

20 / 95

Page 21: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

0.2 0.4 0.6 0.8 1.0 1.2

01

23

4

PostDens [Precision for the Gaussian observations]

Figure 2: Marginal distribution of the error precision.

21 / 95

Page 22: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

INLA for the Linear Model

I As with a non-Bayesian analysis, model checking is importantand in Figure 3 we present a number of diagnostic plots.

I Plots:(a) Normality of residuals? Sample size is quite small.(b) Is the relationship with age linear?(c) Mean variance relationship?(d) Overall fit.

I For these data, the model assumptions look reasonable.

22 / 95

Page 23: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

FTO Diagnostic Plots

● ●

●●

●●●

−2 −1 0 1 2

−2−1

01

Theoretical Quantiles

Sam

ple

Quan

tiles

(a)

● ●

●●

● ●●

1 2 3 4 5

−2−1

01

Age

Resid

uals

(b)

● ●

●●

● ●●

5 10 15 20 25

−2−1

01

Fitted

Resid

uals

(c)

5 10 15 20 255

1015

2025

Observed

Fitte

d

(d)

Figure 3: Plots to assess model adequacy: (a) Normal QQ plot, (b) residualsversus age, (c) residuals versus fitted, (d) fitted versus observed.

23 / 95

Page 24: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Bayes Logistic Regression

I The likelihood is

Y (x)|p(x) ∼ Binomial( N(x),p(x) ), x = 0,1,2.

I Logistic link:

log

(p(x)

1− p(x)

)= α + θx

I The prior isp(α, θ) = p(α)× p(θ)

withI α ∼ N(µα, σα) andI θ ∼ N(µθ, σθ). where µα, σα, µθ, σθ are constant that are specified

to reflect prior beliefs.

24 / 95

Page 25: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Comparison of MLE and Bayess

# MLElog i tmod <− glm ( cbind ( y , z ) ˜ x , f a m i l y =” b inomia l ” )# MLEs and SEscbind ( coef ( log i tmod ) , s q r t ( d iag ( vcov ( log i tmod ) ) ) )

[ , 1 ] [ , 2 ]( I n t e r c e p t ) −1.8076928 0.4553938x 0.4787428 0.2504594# INLAcc .mod <− i n l a ( y ˜ x , f a m i l y =” b inomia l ” , data=cc . dat , N t r i a l s =y+z )# P o s t e r i o r mean and SDcc . mod$summary . f i x e d [ c ( 1 , 2 ) ]

mean sd( I n t e r c e p t ) −1.8069628 0.4553857x 0.4800092 0.2504597

Virtually identical!

25 / 95

Page 26: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Prior Choice for Positive Parameters

I It is convenient to specify lognormal priors for a positiveparameter, for example exp(β) (the odds ratio) in a logisticregression analysis.

I One may specify two quantiles of the distribution, and directlysolve for the two parameters of the lognormal.

I Denote by θ ∼ LogNormal(µ, σ) the lognormal distribution for ageneric positive parameter θ with E[log θ] = µ andvar(log θ) = σ2, and let θ1 and θ2 be the q1 and q2 quantiles ofthis prior.

I In our example, θ = exp(β).I Then it is straightforward to show that

µ = log(θ1)

(zq2

zq2 − zq1

)−log(θ2)

(zq1

zq2 − zq1

), σ =

log(θ1)− log(θ2)

zq1 − zq2

.

(1)

26 / 95

Page 27: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Prior Choice for Positive Parameters

I As an example, suppose thatfor the odds ratio eβ webelieve there is a 50% chancethat the odds ratio is less than1 and a 95% chance that it isless than 5; with

q1 = 0.5, θ1 = 1.0, q2 = 0.95, θ2 = 5.0,

we obtain lognormalparameters

µ = 0σ = (log 5)/1.645 = 0.98.

I The density is shown inFigure 4.

0 1 2 3 4 5 6 7

0.0

0.1

0.2

0.3

0.4

0.5

0.6

x

LogN

orm

al D

ensi

ty

Figure 4: Lognormal density with 50%point 1 and 95% point 5.

27 / 95

Page 28: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Logistic Regression Example

I In the second analysis wespecify

α ∼ N(0,1/0.1)

θ ∼ N(0,W )

where W is such that the97.5% point of the prior islog(1.5), i.e. we believe theodds ratio lies between 2/3and 3/2 with probability 0.95.

I The marginal distributions aredisplayed.

−2.5 −2.0 −1.5 −1.0 −0.5 0.0

0.0

0.4

0.8

1.2

PostDens [(Intercept)]

Mean = −1.323 SD = 0.29

−0.5 0.0 0.5 1.0

0.0

1.0

2.0

PostDens [x]

Mean = 0.199 SD = 0.154

Figure 5: Posterior marginals for theintercept α and the log odds ratio θ.

28 / 95

Page 29: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Comparison of MLE and Bayess

# MLElog i tmod <− glm ( cbind ( y , z ) ˜ x , f a m i l y =” b inomia l ” )# MLEs and SEscbind ( coef ( log i tmod ) , s q r t ( d iag ( vcov ( log i tmod ) ) ) )

[ , 1 ] [ , 2 ]( I n t e r c e p t ) −1.8076928 0.4553938x 0.4787428 0.2504594# INLAW <− LogNormalPriorCh (1 ,1 .5 ,0 .5 ,0 .975 ) $sigma ˆ2cc . mod2 <− i n l a ( y ˜ x , f a m i l y =” b inomia l ” , data=cc . dat , N t r i a l s =y+z ,

c o n t r o l . f i x e d = l i s t (mean . i n t e r c e p t =c ( 0 ) , prec . i n t e r c e p t =c ( . 1 ) ,mean=c ( 0 ) , prec=c ( 1 /W) ) )

cc . mod2$summary . f i x e d [ c ( 1 , 2 ) ]mean sd

( I n t e r c e p t ) −1.322757 0.2895597x 0.198683 0.1535503

Big changes!

29 / 95

Page 30: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

GLMMs

30 / 95

Page 31: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Smoothing

When faced with estimation n different quantities of the prevalenceunder different conditions, there are three model choices:

I The true underlying prevalence risks are ALL THE SAME.I The true underlying prevalence risks are DISTINCT but not

linked.I The true underlying prevalence risks are SIMILAR IN SOME

SENSE.

The third option seems plausible when the conditions are related, buthow do we model “similarity”?

31 / 95

Page 32: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Smoothing

There are a number of possibilities for SMOOTHING models:

I The prevalences are drawn from some COMMON probabilitydistribution, but are not ordered in any way. We refer this as theindependent and identically distributed, or IID model. We couldthink of this as saying we think the prevalences are likely to be ofthe same order of magnitude.

I The prevalences are CORRELATED over time.

These are both examples of HIERARCHICAL or RANDOMEFFECTS MODELS — a key element is estimating the SMOOTHINGPARAMETER.

32 / 95

Page 33: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Smoothing over Time

Rationale and overview of models for temporal smoothing:I We often expect that the true underlying prevalence in a study

region will exhibit some degree of smoothness over time.I A linear trend in time is unlikely to be suitable for more than a

small number of years, and higher degree polynomials canproduce erratic fits.

I Hence, local smoothing is preferred.I Splines and random walk models have proved successful as

local smoothers.I And to emphasize again, in either approach, the choice of

smoothing parameter is crucial.

33 / 95

Page 34: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Random Walk ModelsWe use random walk models which encourage the mean responses(e.g., prevalences) across time to not deviate too greatly from theirneighbors.

The true underlying mean of the prevalence at time t is modeled as afunction of its neighbors:

µt | µNE(t) ∼ N(mt , vt ),

whereI µt is the mean prevalence (or some function of it such as the

logit) at time t .I µNE(t) is the set of neighboring means – with the number of

neighbors chosen depending on the model used – typically 2 or4.

I mt is the mean of some set of neighbors – for a first orderrandom walk or RW1 it is simply 1

2 (µt−1 + µt+1).I vt is the variance, and depends on the number of neighbors – for

the RW1 model it is σ2/2, where σ2 is a smoothing parameter –small values give large smoothing.

34 / 95

Page 35: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Random Walk Models

I The smoothing parameter σ2 is estimated from the data, anddetermines the extent deviations from the mean are penalized.

I The penalty term for the RW1 model is:

p(µt | µt−1, µt+1, σ2) ∝ exp

{− 1

2σ2

[µt − 1

2 (µt−1 + µt+1)]2}

.

I Hence:I Values of µt that are close to 1

2 (µt−1 + µt+1) are favored (higherdensity).

I The relative favorability is governed by σ2 – if this variance is small,then µt can’t stray too far from its neighbors.

I Predictions from the RW1 are

µT+S|µ1, . . . , µT , σ2 ∼ N(µT , σ

2 × S).

35 / 95

Page 36: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

●●●●●●

−2

0

2

0 1 2 3 4 5 6Time

Mea

n R

espo

nse

First Order Random Walk

Figure 6: Illustration of the RW1 model for smoothing at time 3. The mean ofthe smoother is the average of the two adjacent points (and is highlighted as•), and deviations from this mean are penalized via the normal distributionshown in red.

36 / 95

Page 37: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

RW1 ModelI Form of the prior density is:

π(µ|σ2) ∝ exp

(− 1

2σ2

T−1∑t=1

(µt+1 − µt )2

)

= exp

(− 1

2σ2

∑t∼t′

(µt − µt′)2

)= exp

(−1

2µTQµ

)where t ∼ t ′ indicates t is a neighbor of t ′ and the precision isQ = R/σ2 with

R =

1 −1−1 2 −1

−1 2 −1. . . . . . . . .

−1 2 −1−1 1

and zeroes everywhere else.

I This sparsity leads to big gains in computational efficiency.37 / 95

Page 38: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

RW2 Model

I The second order RW (RW2) model produces smoothertrajectories than the RW1, and has more reasonable short termpredictions, which is desirable for modeling child prevalence.

I In terms of second differences:

(µt − µt−1)− (µt−1 − µt−2) ∼ N( 0, σ2 ),

showing that deviations from linearity are discouraged.I Forecasts S steps ahead have a normal distribution with mean:

E[µT+S | µ1, . . . , µT ] = µT + S(µT − µT−1)

which is a linear function of the values at the last two time points.I The variance is

var(µT+S | µ1, . . . , µT ) =σ2

6× S(S + 1)(2S + 1)

which is cubic in the number of periods S, so blows up veryquickly.

38 / 95

Page 39: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

RW2 Model

I Form of the prior density is:

π(µ|σ2) ∝ exp

(− 1

2σ2

T−2∑t=1

(µt+2 − 2µt+1 + µt )2

)

= exp

(−1

2µTQµ

)where the precision is Q = R/σ2 with

R =

1 −2 1−2 5 −4 1

1 −4 6 −4 11 −4 6 −4 1

· · · · ·1 −4 6 −4 1

1 −4 5 −21 −2 1

and zeroes everywhere else.

39 / 95

Page 40: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

0 20 40 60 80 100

68

1012

14

Time (years)

Nile

Vol

ume

(Sca

led)

Smoothing Parameter:

Very Small

Medium

Very Large

Random Walk of Order 1 (RW1) Fits

Figure 7: Nile data with RW1 fits under different priors for smoothingparameter σ−2.

40 / 95

Page 41: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

0 20 40 60 80 100

68

1012

14

Time (years)

Nile

Vol

ume

(Sca

led)

Smoothing Parameter:

Very Small

Medium

Very Large

Random Walk of Order 2 (RW2) Fits

Figure 8: Nile data with RW2 fits under different priors for smoothingparameter σ−2.

41 / 95

Page 42: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Temporal Smoothing Model Summary

We have three models:

IID MODEL:µt ∼ N(0, σ2),

smooth towards zero.RW1 MODEL:

µt − µt−1 ∼ N(0, σ2),

smooth towards the previous value.RW2 MODEL:

(µt − µt−1)− (µt−1 − µt−2) ∼ N(0, σ2),

smooth towards the previous slope.

42 / 95

Page 43: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

RW Fitting to Simulated Data

I We illustrate fitting with the RW2 model, using the simulated dataseen earlier.

I The model is:

Yt |pt ∼ Binomial(nt ,pt )pt

1− pt= exp(α + φt )

(φ1, . . . , φT ) ∼ RW2(σ2)

σ2 ∼ Prior on Smoothing Parameterα ∼ Prior on Intercept

43 / 95

Page 44: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

RW Fitting to Simulated Data

I Fit using R-INLA.

n1 <− 10p <− 0.2t ime <− seq (1 ,60)# Simulate datay1 <− rbinom ( leng th ( t ime ) , n1 , p )i n l a d f 1 <− data . frame ( y1=y1 , t ime=t ime )# Def ine modelformula1s = y1 ˜ f ( t ime , model =” rw2 ” )f i t 1 s <− i n l a ( formula1s , data= in l ad f 1 ,

f a m i l y =” b inomia l ” , N t r i a l s =n1 ,c o n t r o l . p r e d i c t o r = l i s t ( compute=TRUE) )

I On Figures 9 and 10 the fitted values are shown in red – in boththe constant prevalence and curved prevalence cases, thereconstruction is reasonable.

44 / 95

Page 45: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

0.0

0.1

0.2

0.3

0.4

0.5

0 20 40 60

Time (months)

Pre

vale

nce

Est

imat

e

n1=10

0.0

0.1

0.2

0.3

0.4

0.5

0 20 40 60

Time (months)

Pre

vale

nce

Est

imat

e

n2=20

0.0

0.1

0.2

0.3

0.4

0.5

0 20 40 60

Time (months)

Pre

vale

nce

Est

imat

e

n3=200

Figure 9: Prevalence estimates over time from simulated data, trueprevalence p = 0.2 (blue solid lines). Smoothed random walk estimates inred.

45 / 95

Page 46: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

0.0

0.2

0.4

0.6

0.8

0 20 40 60

Time (months)

Pre

vale

nce

Est

imat

e

n1=10

0.0

0.2

0.4

0.6

0.8

0 20 40 60

Time (months)

Pre

vale

nce

Est

imat

e

n2=20

0.0

0.2

0.4

0.6

0.8

0 20 40 60

Time (months)

Pre

vale

nce

Est

imat

e

n3=200

Figure 10: Prevalence estimates over time from simulated data, trueprevalence corresponds to curved blue solid line. Smoothed random walkestimates in red.

46 / 95

Page 47: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

The RNA-Seq Data: INLA AnalysisI Recall there are two replicates and so for each of N genes we

obtain two sets of counts.I For the diploid hybrid, let Yij be the number of A alleles for gene i

and replicate j , and Nij is the total number of counts, j = 1,2.I We fit a hierarchical logistic regression model starting with first

stage:Yij |Nij ,pij ∼ Binomial(Nij ,pij )

so that pij is the probability of seeing an A read for gene i andreplicate j .

I At the second stage:

log(

pij

1− pij

)= θi + εij

where εij |σ2 ∼ N(0, σ2) represent random effects that allow forexcess-binomial variation; there are a pair for each gene.

I The θi parameters are taken as fixed effects with relatively flatpriors.

I exp(θi ) is the odds of seeing an A read for gene i .I Figures 11, 12 and 13 summarize inference.

47 / 95

Page 48: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

−2.0 −1.0 0.0 0.50.0

0.51.0

1.5

PostDens [as.factor(xvar)1]

Mean = −0.832 SD = 0.23

0.5 1.5 2.5 3.5

0.00.4

0.81.2

PostDens [as.factor(xvar)2]

Mean = 2.022 SD = 0.293

−1.0 0.0 1.0

0.00.5

1.01.5

PostDens [as.factor(xvar)3]

Mean = 0.173 SD = 0.256

−1.0 0.0 0.5 1.0 1.5

0.00.5

1.01.5

PostDens [as.factor(xvar)4]

Mean = 0.43 SD = 0.232

−0.5 0.5 1.0 1.5 2.0

0.00.5

1.01.5

PostDens [as.factor(xvar)5]

Mean = 0.596 SD = 0.23

−0.5 0.5 1.0 1.5 2.0

0.00.5

1.01.5

PostDens [as.factor(xvar)6]

Mean = 0.546 SD = 0.235

0.0 1.0 2.0 3.0

0.00.5

1.01.5

PostDens [as.factor(xvar)7]

Mean = 1.363 SD = 0.283

−1.0 0.0 0.5 1.0

0.00.5

1.01.5

PostDens [as.factor(xvar)8]

Mean = 0.028 SD = 0.231

−1.0 0.0 0.5 1.0

0.00.5

1.01.5

PostDens [as.factor(xvar)9]

Mean = 0.015 SD = 0.234

Figure 11: Posterior marginals for the first 9 gene effects θi (compare withzero for evidence of cis effects). We plot 9 rather than all 10 for displaypurposes.

48 / 95

Page 49: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

5 10 15 20

−1.0

−0.5

0.00.5

1.0

rep1

PostMean 0.025% 0.5% 0.975%

Figure 12: Posterior quantiles for 20 random effects εij , which allowexcess-binomial variation.

49 / 95

Page 50: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

0 20 40 60 80 100 120

0.00

0.02

0.04

0.06

0.08

PostDens [Precision for rep1]

Figure 13: Posterior marginal for precision of random effects σ−2.

50 / 95

Page 51: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

An Informative Summary for the RNA-Seq Data

I We extract the 95% intervals and posterior medians for the logodds of being an A allele.

I Comparison with 0 (in Figure 14) gives an indication of ciseffects.

I Genes 1, 2, 5, 6, 7 show evidence of cis effects.

51 / 95

Page 52: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

−1 0 1 2

24

68

10

Log Odds

Gene

Figure 14: Posterior marginal intervals for posterior of interest θi . Genes withposterior intervals that do not include zero, show evidence of cis effects.

52 / 95

Page 53: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Approximate Bayes

53 / 95

Page 54: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Approximate Bayes Inference

I Particularly in the context of a large number of experiments, aquick and accurate model is desirable.

I We describe such a model in the context of a GWAS.I This model is relevant when the sample size in each experiment

is large.I We first recap the normal-normal Bayes model.I Subsequently, we describe the approximation and provide an

example.

54 / 95

Page 55: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Recall: The Normal-Normal Model

The model:I Prior: θ ∼ N(m, v) andI Likelihood: Y1, . . . ,Yn|θ ∼ N(θ, σ2).

Posterior p(θ|y1, . . . , yn) is normal with

var(θ|y1, . . . , yn) = [1/v + n/σ2]−1

and

E[θ|y1, . . . , yn] =m/v + yn/σ2

1/v + n/σ2

= m(

1/v1/v + n/σ2

)+ y

(n/σ2

1/v + n/σ2

)

55 / 95

Page 56: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

A Normal-Normal Approximate Bayes Model

I Consider again the logistic regression model

log(

pi

1− pi

)= α + xiθ

with interest focusing on θ.I We require priors for α, θ, and some numerical/analytical

technique for estimation/Bayes factor calculation.I Wakefield (2007, 2009) considered replacing the likelihood by

the asymptotic distribution of the MLE, to give posterior:

p(θ|θ) ∝ p(θ|θ)p(θ)

whereI θ|θ ∼ N(θ,V ) – the asymptotic distribution of the MLE,I θ ∼ N(0,W ) – the prior on the log RR. Can choose W so that 95%

of relative risks lie in some range, e.g. [2/3,1.5].

56 / 95

Page 57: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Posterior Distribution

I Under this model, the posterior distribution for the log odds ratioθ is

θ|θ ∼ N(r θ, rV )

wherer =

WV + W

.

I Hence, we have shrinkage to the prior mean of 0.I The posterior median for the odds ratio is exp(r θ) and a 95%

credible interval isexp(r θ ± 1.96

√rV ).

I Note that as W →∞ and/or V → 0 (which occurs as we gathermore data) the non-Bayesian point and interval estimates arerecovered (since r → 1).

57 / 95

Page 58: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

A Normal-Normal Approximate Bayes Model

I We are interested in the hypotheses: H0 : θ = 0, H1 : θ 6= 0 andevaluation of the Bayes factor

BF =p(θ|H0)

p(θ|H1).

I Using the approximate likelihood and normal prior we obtain:

Approximate Bayes Factor =1√

1− rexp

(−Z 2

2r),

with Z = θ√V

, r = WV+W .

58 / 95

Page 59: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

A Normal-Normal Approximate Bayes Model

I The approximation can be combined with aPrior Odds = π0/(1− π0) to give

Posterior Odds on H0 =BFDP

1− BFDP= ABF× Prior Odds

where BFDP is the Bayesian False Discovery Probability.I BFDP depends on the power, through r .I For implementation, all that we need from the data is the Z -score

and the standard error√

V , or a confidence interval.I Hence, published results that report confidence intervals can be

converted into Bayes factors for interpretation.I The approximation relies on sample sizes that are not too small,

so the normal distribution of the estimator provides a goodsummary of the information in the data.

59 / 95

Page 60: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Combination of Data Across StudiesI Suppose we wish to combine data from two studies where we

assume a common log odds ratio θ.I The estimates from the two studies are θ1, θ2 with standard

errors√

V 1 and√

V 2.I The Bayes factor is

p(θ1, θ2|H0)

p(θ1, θ2|H1).

I The approximate Bayes factor is

ABF(θ1, θ2) = ABF(θ1)× ABF(θ2|θ1) (2)

where

ABF(θ2|θ1) =p(θ2|H0)

p(θ2|θ1,H1)

andp(θ2|θ1,H1) = Eθ|θ1

[p(θ2|θ)

]so that the density is averaged with respect to the posterior for θ.

I Important Point: The Bayes factors are not independent.60 / 95

Page 61: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Combination of Data Across Studies

I This leads to an approximate Bayes factor (which summarizesthe data from the two studies) of

ABF(θ1, θ2) =

√W

RV1V2exp

{−1

2

(Z 2

1 RV2 + 2Z1Z2R√

V1V2 + Z 22 RV1

)}where

I R = W/(V1W + V2W + V1V2)

I Z1 = θ1√V1

and

I Z2 = θ2√V2

are the usual Z statistics.

I The ABF will be small (evidence for H1) when the absolutevalues of Z1 and Z2 are large and they are of the same sign.

Stephens (2017) extends the ABF approach in an interesting way.

61 / 95

Page 62: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Combination of Data Across Studies: The GeneralCase

I Suppose we have K studies with estimates θk and asymptoticvariances Vk , k = 1, ...,K .

I Assume a common underlying parameter θ.I The Bayes factor is given by

BFK =p(θ1, . . . , θK |H0)

p(θ1, . . . , θK |H1)

=

∏Kk=1(2πVk )−1/2 exp

(− θ2

k2Vk

)∫ ∏K

k=1(2πVk )−1/2 exp

(− (θ2

k−θ)2

2Vk

)(2πW )−1/2 exp

(− θ2

2Vk

)dθ

=

√√√√W

(W−1 +

K∑k=1

V−1k

)exp

−12

(K∑

k=1

θk

Vk

)2(W−1 +

K∑k=1

V−1k

)−1

62 / 95

Page 63: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Combination of Studies: The General Case

I The posterior is given by

θ|θ1, . . . , θK ∼ N(µ, σ2)

where

µ =

(K∑

k=1

θk

Vk

)(W−1 +

K∑k=1

V−1k

)−1

σ2 =

(W−1 +

K∑k=1

V−1k

)−1

63 / 95

Page 64: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Example of Combination of Studies in a GWAS

I We illustrate how reported confidence intervals can be convertedto Bayesian summaries.

I Frayling et al. (2007) report a GWAS for Type II diabetes.I For SNP rs9939609:

Pr(H0|data) with prior:Stage Estimate (CI) p-value − log10 BF 1/5,000 1/50,0001st 1.27 (1.16–1.37) 6.4× 10−10 7.28 0.00026 0.00262nd 1.15 (1.09–1.23) 4.6× 10−5 2.72 0.905 0.990Combined – – 13.8 8× 10−11 8× 10−10

I Combined evidence is stronger than each separately since thepoint estimates are in agreement.

I For summarizing inference the (5%, 50%, 95%) points for the RRare:

Prior 1.00 (0.67–1.50)First Stage 1.26 (1.17–1.36)Combined 1.21 (1.15–1.27)

64 / 95

Page 65: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Conclusions

I Computationally GLMs and GLMMs can now be fitted in arelatively straightforward way.

I INLA is very convenient and is being constantly improved.I As with all analyses, it is crucial to check modeling assumptions

(and there are usually more in a Bayesian analysis).I Markov chain Monte Carlo provides an alternative for

computation. Stan, WinBUGS and JAGS are possibilities.I The mixture models required specialized code.

65 / 95

Page 66: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

ReferencesBlangiardo, M. and Cameletti, M. (2015). Spatial and Spatio-Temporal

Bayesian Models with R-INLA. John Wiley and Sons.

Connelly, C., Wakefield, J., and Akey, J. (2014). Evolution andarchitecture of chromatin accessibility and function in yeast. PLoSGenetics. To appear.

Frayling, T., Timpson, N., Weedon, M., Zeggini, E., Freathy, R., andet al., C. L. (2007). A common variant in the FTO gene isassociated with body mass index and predisposes to childhood andadult obesity. Science, 316, 889–894.

Krainski, E. T., Gomez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo,D., Simpson, D., Lindgren, F., and Rue, H. (2018). AdvancedSpatial Modeling with Stochastic Partial Differential EquationsUsing R and INLA. Chapman and Hall/CRC.

Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesianinference for latent Gaussian models using integrated nestedLaplace approximations (with discussion). Journal of the RoyalStatistical Society, Series B, 71, 319–392.

Skelly, D., Johansson, M., Madeoy, J., Wakefield, J., and Akey, J.(2011). A powerful and flexible statistical framework for testing

65 / 95

Page 67: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

hypothesis of allele-specific gene expression from RNA-Seq data.Genome Research, 21, 1728–1737.

Stephens, M. (2017). False discovery rates: a new deal. Biostatistics,18, 275–294.

Wakefield, J. (2007). A Bayesian measure of the probability of falsediscovery in genetic epidemiology studies. American Journal ofHuman Genetics, 81, 208–227.

Wakefield, J. (2009). Bayes factors for genome-wide associationstudies: comparison with p-values. Genetic Epidemiology , 33,79–86.

Wang, X., Yue, Y., and Faraway, J. J. (2018). Bayesian RegressionModeling with INLA. Chapman and Hall/CRC.

66 / 95

Page 68: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Appendix: Hierarchical Modeling ofAllele-Specific Expression Data

66 / 95

Page 69: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Specifics of ASE Experiment

Details of the data:I Two “individuals” from genetically divergent yeast strains, BY and

RM, are mated to produce a diploid hybrid.I Three replicate experiments: same individuals, but separate

samples of cells.I Two technologies: Illumina and ABI SOLiD. Each of a few trillion

cells are processed.I Pre- and post-processing steps are followed by fragmentation to

give millions of 200–400 base pair long molecules, with shortreads obtained by sequencing.

I Strict criteria to call each read as a match are used, to reduceread-mapping bias.

I Data from 25,652 SNPs within 4,844 genes.

67 / 95

Page 70: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Allele Specific Expression via RNA-Seq

Additional data:I Genomic DNA is sequenced in the diploid hybrid, which has one

copy of each gene from BY and from RM.I The only difference between the genomic DNA and the main

experiment is that we expect the genomic DNA to always bepresent 50:50 (one copy each of BY and RM), whereas for themain experiment it is only 50:50 if there is no ASE.

I For both genomic DNA and RNA we obtain counts at SNPs, ateach of BY and RM.

68 / 95

Page 71: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Genomic DNA and RNA

To clarify: we have the RNA measurements which are of primaryinterest and the genomic DNA which is like a control.

For genomic DNA, every cell will have one copy of each locus from“Mom” and one from “Dad” (assuming diploid).

If we could sequence the contents of the cell perfectly, then across apopulation of billions/trillions of cells we should see exactly 50% Momalleles and 50% Dad alleles.

In reality there will be sampling noise due to differences in things likeamplification efficiency during the sequencing library preparationprocess.

69 / 95

Page 72: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Genomic DNA and RNA

For the RNA, we are measuring transcribed molecules so there mightbe x copies of Moms locus and y copies of Dads.

The key is that the same sampling noise present in the DNA data isalso present in the RNA data, because it undergoes the samesequencing library preparation process.

Actually there is one additional step for RNA which is converting tocDNA. This may add some noise as well but it is probably less thanthe combined effects of the other steps in the process.

So in this sense, using the DNA to model the sampling noise inmolecule counts can serve as a useful baseline for calibrating ourexpectations of how much deviation in 50:50 we need to see at theRNA level before we believe the difference to be interesting.

70 / 95

Page 73: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Cartoon of the Experiment

Figure 15: Mapping of RNA short reads to BY and RM.

71 / 95

Page 74: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Statistical Problem

I Aim of the Experiment: Estimate the proportion of genes thatdisplay ASE.

I Let p be the probability of a map to BY at a particular SNP.I Additionally, we would like to classify genes into:

I Genes that do not show ASE.I Genes that show:

I Constant ASE across SNPs.I Variable ASE across SNPs, i.e. p varies within gene.

Subsequently, we will examine genes displaying ASE toinvestigate the mechanism.

I A hierarchical model is feasible since we have within gene andbetween gene variability.

I Further, a mixture model is suggested, with a mixture of genesthat do not display ASE (so there p’s are 0.5) and that do displayASE.

72 / 95

Page 75: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Summaries for ASE Data

Gene Denominators

Frequ

ency

0 2000 6000 10000

010

0020

0030

0040

00

Gene Proportions

Frequ

ency

0.0 0.2 0.4 0.6 0.8 1.0

040

080

012

00

SNP Denominators

Frequ

ency

0 100 200 300 400 500

050

0015

000

SNP ProportionsFre

quen

cy0.0 0.2 0.4 0.6 0.8 1.0

010

0020

0030

00

Figure 16: Summaries for RNA BY/RM yeast data; note that 739 SNPdenominators are >500 and are not plotted.

73 / 95

Page 76: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

no ASE

(A)

fraction reads from allele 10.5

(B)

ASE

fraction reads from allele 10.5

variable

ASE

ASE

no ASE

(C)

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 17: Schematic of the hierarchical model.

74 / 95

Page 77: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Approach to Modelling RNASeq Data

Overview, three models fitted:1. Model 1: Two component mixture model to filter out aberrant

SNPs using genomic DNA data.2. Model 2: Using the filtered genomic DNA data, fit a hierarchical

SNP within gene model, to determine the “null” distribution ofcounts.Specifically: “wobble” in p about 0.5, and SNP “wobble” in pwithin genes.Absence of ASE is not experimentally equivalent toYi ∼ binomial(Ni ,p = 0.5) because of the steps involved in theexperiment.

3. Model 3: For the RNA Seq data develop a two-componentmixture model where each gene either displays no ASE, or ASE,with null component determined from the analysis of the genomicDNA data (Model 2).

75 / 95

Page 78: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Model 1: Filtering Model for Genomic DNA

Two-component mixture model for SNPs:1. Majority of SNP counts arise from a beta-binomial distribution

with p “close to” 0.5 (component 1).2. Minority of SNP counts arise from a beta-binomial distribution

with p “not close to” 0.5 due to sequencing bias at these SNPs(component 2).

I Data: yj and Nj are counts at SNP j for j = 1, . . . ,m SNPs.I Note: Ignores gene information – don’t want to impose too much

structure at this point.I SNPs that are more likely to arise from component 2 are then

removed from further analyses.

76 / 95

Page 79: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Filtering Model for Genomic DNA

I Stage 1: SNP Count Likelihood:

yj |pj ∼ binomial(Nj ,pj ), j = 1, ...,N.

I Stage 2: Between-SNP Prior:

pj |a,b, c, π0 =

{beta(a,a) with probability π0beta(b, c) with probability 1− π0

I Stage 3: Hyperpriors: Constrain b < 1, c < 1 to give U-shapedbeta distribution.

a ∼ lognormal(4.3,1.8)?

b ∼ uniform(0,1)

c ∼ uniform(0,1)

π0 ∼ uniform(0,1)

?80% interval for p : [0.43,0.57]. Separate a,b, c, π0 for eachtechnology.

77 / 95

Page 80: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Implementation for Genomic DNA

I Integrate pj from model to give:

yj |a,b, c, π0 ∼ π0 × beta-binomial(Nj ,a,a)

+ (1− π0)× beta-binomial(Nj ,b, c).

I This is a mixture of two distributions:1. The first distribution is for the majority of signals close to 0.5. The

size of a denotes how close is close.2. The second distribution is for the minority of aberrant SNPs.

78 / 95

Page 81: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Implementation for Genomic DNA

I Likelihood:

Pr(y |a, b, c, π0) =N∏

j=1

(Nj

Yj

) {π0

Γ(2a)

Γ(a)2

Γ(yj + a)Γ(Nj − yj + a)

Γ(Nj + 2a)

+ (1− π0)Γ(b + c)

Γ(b)Γ(c)

Γ(yj + b)Γ(Nj − yj + c)

Γ(Nj + b + c)

}I Posterior:

p(a,b, c, π0|y) ∝ Pr(y |a,b, c, π0)× p(a)p(b)p(c)p(π0).

I Implementation: Markov chain Monte Carlo.I Recall: Sequencing bias lead to aberrant SNPs, and these errors

are likely to be repeated in the main experiment.I SNPs falling in the second mixture component were removed from

further analyses.

79 / 95

Page 82: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Posterior Distributions

a

Frequ

ency

260 280 300 320

050

100

150

b

Frequ

ency

0.60 0.65 0.70 0.75

050

100

200

c

Frequ

ency

0.60 0.65 0.70

050

100

200

300

π0

Frequ

ency

0.935 0.940 0.945

050

100

150

200

Figure 18: Posteriors for genomic filtering model for Illumina platform.

80 / 95

Page 83: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Posterior Filter

posterior probability of biased

Frequ

ency

0.0 0.2 0.4 0.6 0.8 1.0

050

0010

000

1500

020

000

Figure 19: Posterior probabilities of biased genomic DNA SNPs: 1,295removed from 25,262.

81 / 95

Page 84: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Effect of Filtering

Unfiltered

pi

Freque

ncy

0.0 0.2 0.4 0.6 0.8 1.0

0200

400600

800100

0120

0

Filtered

piFre

quency

0.0 0.2 0.4 0.6 0.8 1.0

0200

400600

800100

0120

0140

0Figure 20: Original and filtered data, for Illumina platform.

82 / 95

Page 85: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Model 2: Calibration Model for Genomic Data

I With aberrant SNPs removed, the next step is to calibrate thenull component.

I Stage 1: Within-Gene Likelihood:

Yij |pij ∼ binomial(Nij ,pij ).

where pij is the probability of an outcome from the first geneticbackground.

I Stage 2: Within-Gene Prior:

pij |αi , βi ∼ beta(αi , βi )

so that αi , βi determine the distribution of variants within gene i .

83 / 95

Page 86: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Calibration Model for Genomic DataI αi and βi are not straightforward to interpret.I We reparameterize (αi , βi )→ (pi ,ei ) with mean and dispersion

parameters (recall αi + βi is a prior sample size):

pi =αi

αi + βi

ei =1

1 + αi + βi

I Moments of ASE parameters:

E [pij |pi ,ei ] = pi

var(pij |pi ,ei ) = pi (1− pi )ei

I Moments of data:

E [Yij |pi ,ei ] = Nijpi

var(Yij |pi ,ei ) = Nijpi (1− pi )[1 + (Nij − 1)ei

]I As ei → 0 we approach the binomial model.I As ei → 1 we have more overdispersion (variability within gene).

84 / 95

Page 87: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Calibration Model for Genomic data

I Stage 3: Within-Gene Likelihood:

pi |a ∼ beta(a,a)

ei |d ∼ beta(1,d)

Note: prior on within-gene dispersion is monotonic decreasingfrom 0 (corresponding to no variability).

I Stage 4: Hyperpriors: Require priors on a > 0,d > 0.I We take

a ∼ lognormal(4.3,1.8)

d ∼ exponential(0.0001)

I The latter prior determines the within-gene variability within-genevariability in genomic DNA – chosen by examination of resultantpij ’s.

I Separate a,d for each technology.

85 / 95

Page 88: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

a

Frequ

ency

1000 2000 3000 4000 5000

020

040

060

080

0

d

Frequ

ency

500 550 600 650

010

020

030

040

050

060

0Figure 21: Posteriors for the RNA-Seq data, Illumina platform.

86 / 95

Page 89: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Model 3: Model for RNA-Seq Data

I Data are modeled as a two-component mixture: the first “null”component having a known distribution, from the genomic DNAanalysis on the filtered data.

I Stage 1: Within-Gene Likelihood:

Yij |pij ∼ binomial(Nij ,pij ).

where pij is the probability of an outcome from the first geneticbackground.

I Stage 2: Within-Gene Prior:

pij |αi , βi ∼ beta(αi , βi )

so that αi , βi determine the distribution of variants within gene i .I Stage 3: Between-Gene Prior: We again reparameterize

(αi , βi )→ (pi ,ei ):

pi ,ei |f ,g,h, π0 ∼{

beta(a, a)× beta(1, d) with probability π0beta(f ,g)× beta(1,h) with probability 1− π0

with a, d from genomic DNA analysis.87 / 95

Page 90: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Stage 4: Hyperpriors: Require priors on π0, f > 0,g > 0,h > 0.I Uniform prior on π0.I f and g describe beta distribution of pi for genes displaying ASE

– want this distribution to be centered on symmetry.I Reparameterize as

q =f

f + gr =

11 + f + g

so that E [pi ] = q, var(pi ) = q(1− q)r .I Through experimentation:

q ∼ beta(100,100) r ∼ beta(1,20)

I For h, the distribution of within-gene variability in ASE:

h ∼ exponential(0.03).

I Separate f ,g,h for each technology.

88 / 95

Page 91: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

f

Frequ

ency

24 26 28 30 32 34

010

020

030

040

0

g

Frequ

ency

24 26 28 30 32 34

010

020

030

0

h

Frequ

ency

18 19 20 21 22 23 24

050

150

250

π0

Frequ

ency

0.22 0.24 0.26 0.28 0.30

010

020

030

040

0Figure 22: Posteriors for the RNA-Seq data, Illumina platform.

89 / 95

Page 92: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Rank: binomial test

Ra

nk: B

aye

sia

n m

od

el

1000 2000 3000 4000

10

00

20

00

30

00

40

00

Moreevidencefor ASE

Lessevidencefor ASE(a)

Lessconsistent

Moreconsistentbetweenmethods

CCW145! 3!

(b)

p

0.0

0.2

0.4

0.6

0.8

1.0

SCW45! 3!

(c)

SNPs/Indels

p

0.0

0.2

0.4

0.6

0.8

1.0

Illumina GAII

ABI SOLiD 1

ABI SOLiD 2

Scale

1

5

10

20

40

75

150

300

600

1200

Figure 23: Comparison of rankings from binomial test and hierarchical model.

90 / 95

Page 93: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Rank: binomial test

Ran

k: B

ayes

ian

mod

el

1000 2000 3000 4000

1000

2000

3000

4000

Moreevidencefor ASE

Lessevidencefor ASE(a)

Lessconsistent

Moreconsistentbetweenmethods

CCW145! 3!

(b)

p

0.0

0.2

0.4

0.6

0.8

1.0

SCW45! 3!

(c)

SNPs/Indels

p

0.0

0.2

0.4

0.6

0.8

1.0

Illumina GAII

ABI SOLiD 1

ABI SOLiD 2

Scale

1

5

10

20

40

75

150

300

600

1200

Figure 24: Examples of opposite conclusions: In (b) the p-value said ASEand Bayes not (large sample size, Bayes allows wobble). In (c) the p-valuesaid no ASE, Bayes analysis yes.

91 / 95

Page 94: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

●●

● ●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●●

●●

● ●●

●●●

●●

●●●

● ●

● ●●●●●

●●

●●●

● ●●●●

●●

●●● ●

●●

●●

● ●●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●●●

● ●●●

●●

●●

●●●

●●●●

●● ●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●●

●●

●●●

● ●●● ●

●●●●

●● ●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●●●

●● ●

●●●

●● ●

● ●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●●

●● ●

●● ●

●●●

● ●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●● ●

● ●

●●● ●

●●●

● ● ●

●● ●●

●●●● ●

●●

● ●

● ●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●● ●● ●

●●

● ●

●●

●●●●

●●

●●

●●●

●●●● ●●

● ●●● ●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●●●●●

●●

●●

●●●●

●●

●●

●●●●

● ● ●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

● ●●●

●●●●●

●●●●●

● ●

●●

●●●

● ●

●● ●● ●●● ●

●●●●

●●●

●●●● ●

●●

●●●●●

●●●

●●

●●● ●

●●

●●

●●●

●●●●

● ●

●●●

●●

●●

●●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●●●

●●●●

● ●

●● ●

● ●

●●

● ●●●●

● ●

● ●

●● ●

●●

●●

● ●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●●● ●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●●●●

●●

●●

● ●●● ●

●●

●●

●●

●● ●

●●●

●●●

●●● ●

●●

●● ●

●●

●●

●●

●●

●●

●●●●

●● ●

●●

● ●

●●

●●

●●●●

●●●

●●

● ●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●● ●

●●

● ●

●●●

●●

●●●●

●●

●●

●●● ●●

●●●●

●●

●●● ●●●● ●●

●●●●● ●● ●

●●

●●

●● ●

●●●●

●●●

●●

●●

●●● ●

●●

● ●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●●●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●

●● ●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

● ●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●● ●

●● ●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

pi

Post

Prob A

SE

●●●●●●

●●

●●●

●●●●

●●●

●●● ●●

●●●

●● ●● ●

●●

●●●

●●●

● ●●

●●●

●●

●●●●●●●

●●● ●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●●●●●

●●●●●

●●●

●●●●

● ●●

●● ●

●●●

●● ●

●●

●● ●●

●●●●●

● ●●●●

●●●●●●●●

●●●●

●●●

●●●●

●●

●●

●●●●

●● ●●●

●●

●●●

●●●●●●

●●●

●●●●

● ●

●●

● ●●●●●●●●●

●●

●●●●

●●

●●

●●●●● ●●●●

●●●● ●●●●

●●

●● ●●

●●● ●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●

●●●

●●

● ●●●

●●

●●

●●● ●

● ●●●

● ●●●●●●●

●●

●●

●●

●●

●●●●

● ●●

●●

●●●●

●●●●

●●●

●●●●

●●●

●●●●●●●

● ●●

●●● ●●●

●●●

●●●●● ●●●●

●●●●●●●●●●●

●●

●●●●●●●

●●

●●

●● ●●●●

●●●

●●

●●●●

● ●●

●●●●●●●●●

●●

●●

●●

●●●

●●●●●●●●●●

●●●●●●

●●

●●●●

●●●●

●●●●●●●● ●

● ●

●●●●●●●

●●●●●

● ●

●●

●● ●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●● ●

●●●●●●● ●●

●●

●●

●●●●●●●

●●

●●●●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

● ●●

●●

●●

●●● ●● ●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●●●●●●●

●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●● ●

●●●

● ●●●●●●●

●●●●●●

●●

● ●

●●

● ●●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●●●

●●

●●●

●●●

●●●●●

●●●●●●●●●● ●●

●●●●●●●●●● ●●●

●●●●●●●●

●●●

●● ●

●●●●

●●●●●

●●●●●●

●●

●●

●●●

●●●●●

● ●

●●●●

●●●●

●●●

●●●●●●

●● ●

●●

●●

●●

●●●●●●

●●●

●●

●●●●●●●●●●

●●

●●●●●● ●●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●●●●●

●●●●●●

●●●

●●●●

●●

●●

●●

●●

● ●

●●●●

●●●●

●●●●

●●●

●●●

●●●

●●

●●●

●● ●●●

●●●●

●●●

●●●●●●●

● ●

●●

●●

●●●

●●●

●●

●●

●●●●●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●

●● ●●●●● ● ●●●●●

●●●

●●●●●●

●●●● ●●

●● ●

●●●

●●●●

●●

●●

●●

●●●●

●●

● ●●●●●

●●

●●●

●●

●●●●●●

●●●●●●●

●●●

●●●●

●●

●●

●● ●

●●●

● ●

●●●●

●●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●●●●●●●●●●●

●●

●●

●●●

●●●●●

●●●●●

●●●

●●

●●●●●●●●

●●

●●●●

●●●●●

●●●●●●●●●●

●●●●●●● ●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●●

●●●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●

●●●●

●●●

●●

● ●●

●●

●●●●●●●●

●●

●●●●●●

●●●

●●

●●

●●●

●●●●●

●●●●

●●●●●

● ●●

●●●

●●●●●●●

●●

●●●●●●●

●●

●●

●●● ●

●●●●●

●●●●

●●

●●

●●●

●●●●●

●●●

●●●●

●●●●●●●●●

●●

●●●●●●●●

●●

●●●●●

●●

●●●●●

●●●●●●

●●●●●

●●●●●●● ●●●

●●●●

●●

●●●●

●●●●●●

●●●●●

●●●●●●●●●

●●●●●●●●●

● ●●●●●●

●●●

●●●●

●●●●●

●●

●●

●●

●●●●

●●●

●●● ●

●●

●●●●●

●●

● ●●●●●

●●

●●●●●●

●●

●●

●●

●● ●

●●●●●●●●

●●●

●●

●●●

●●

● ●

●●

●●●●

●●●

●●●●●

●●●●●●●●●●

●●●●

●●●●

●●●

●●●●●●●●

●●●

●●●●●●●

●●●●●●●

●●●●●● ●●

●●

●●●●●●●●●

●●●●

●●●●●●●●

●●

●●

●●●●

●●●

● ●

●●

●●●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●●● ●

●●

●●●●●

●●●●●●●●●●●

●●

●●●●

●●●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●● ●●

●●●●

●●●

●●

●●

●●●●

● ●

●●

●●●●●

●●●●

●●●●●

●● ●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●● ●

●●●

●●

●●

●●

●●●●

●●●●

●●●●

●●●●●

●●

●● ●

●●

●●●●

●● ●

●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●●

●● ●

●●

●●●

●●●●

●●

●●●●●●●●●

●●●●●

●●●●●

●●

●●●●

●●

●●

●●●

●●

●●● ●

●●●●●●

●●

●●●●●●●

●●●●

●●

●●

●●●●●●●

●●●●

●●●

●●

●●●●●●●●●●●●●

●●●

●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●

●●

●●

●●●●

●●●●●●●

●●

●●

●●●

●●

●●●●●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●●

●●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●

●●

●●●

●●●●

●●●

●●

●●●●●

●●●

●●●●

● ●●●●●●

●●●●

●●●●●●●

●●●

●●●

●●

●●●●●

●● ●

●●

●●●

● ●●●

● ●●●●●●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●●●

●●●●●

●●●●●●

●●

●● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●

●●●

●●

● ●●●

●●●●

●●

●●●●●●●●●

● ●

●● ●●

●●●●

●●

●●●

●●●

●●

●●●●●●●●●●

●●●

●●●

●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●● ●

●●

●●

●●●

●●●●●●●●

●●

●●

●●

●●●● ●●

●●●●

●●●

●●

●● ●

●●●

●●●●●●

●●

●●●●

●●●

●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●

● ●●●

●●●●●●

●●●●●●●●

●●●

●●

●●

●●

●●●●●

●●●

● ●●●

● ●

●●

●●●

●● ●

●● ●

●●●●

●●●

●●●●

●●

●● ●

●●

● ●●

●●●

●●

●●●

●●●●●

●●

● ●

●●● ●

●●●

●●

●●

●● ●

●●

●●●●●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●●

●●

●●●

●●●

●●

●●●●●

● ●●●

●●●●●●●●● ●

● ●● ●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●●

●●

●●

●● ●●●

●●

●●●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●

● ●●●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●●●●

●●●

●●●●●●

●●●●

● ●

●●●●

●●●

●●

●●

●●●●

●●●

●●●

●●●●●●

●●●●

●●

●●●●●●●●●●●●

●●

● ●

●●●●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●●

●●●●●●

●●

●●

●●

●●●●

●●● ●●

●●

●●

●●●●

●●●

●●●●

●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●

●●

●●●●●●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

● ●●

●●

●●●

●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●

●●●

●●●●●

●●●

●●●

●●●●●●

●●

●●●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●●●●●

●●

● ●● ●

●●

●●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●●

● ●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●●●●

●●●

●●●●●●●●●●●

●●●●●●

●●

●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●

●●●●●

●●●●●●●●●●

●●●●●●

●●

●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.10

0.20

pi

e i

All individuals

ei

Frequ

ency

0.00 0.10 0.20 0.30

050

015

0025

00

ASE individuals only

ei

Frequ

ency

0.00 0.05 0.10 0.15 0.20 0.25

050

010

0015

0020

00Figure 25: Between-gene variability pi and within-gene variability ei .

92 / 95

Page 95: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Varying ASE within genes

I One mechanism: Imagine a gene with an exon and an intron,and that we have SNPs in both.

I At each exonic SNP we see approximately the same number ofBY and RM reads.

I Now suppose the intron is not spliced out for the BY allele, but itis spliced out efficiently for the RM allele. At each intronic SNPwe will still see the same number of BY reads as in the exon(everything else being equal), but approximately 0 RM reads,leading to variable ASE across the gene

I In the figure: The “thin” part of the gene (YML024W) is an intron,while the “thick” part is an exon.

I For the RM allele (magenta) the intron is not spliced out, while itis mostly spliced out in the BY allele (green).

93 / 95

Page 96: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

ScalechrXIII:

BY/RM SNPs

Conservation

500 bases225700 225800 225900 226000 226100 226200 226300 226400 226500 226600 226700 226800

YML024W

Figure 26: Example of a gene displaying variable ASE within a gene. Green =RM, magenta = BY.

94 / 95

Page 97: 2019 SISG Module 8: Bayesian Statistics for Genetics Lecture 7: …faculty.washington.edu/kenrice/sisgbayes/Lecture7-JW... · 2019. 7. 16. · Motivating Example: RNA Seq with Replicates

Conclusions for Mixture Model

I For the ASE data we used the DNA experiment to calibrate theprior.

I More details of this experiment and the model can be found inSkelly et al. (2011).

I Implementation was via Markov chain Monte Carlo, but we hadto write our own code.

95 / 95