Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in...

Modern Computational Statistics

Lecture 20: Applications in ComputationalBiology

Cheng Zhang

School of Mathematical Sciences, Peking University

December 09, 2019

Introduction 2/23

I While modern statistical approaches have been quitesuccessful in many application areas, there are stillchallenging areas where the complex model structuresmake it difficult to apply those methods.

I In this lecture, we will discuss some of the recentadvancement on statistical approaches for computationalbiology, with an emphasis on evolutionary models.

Challenges in Computational Biology 3/23

Adapted from Narges Razavian 2013

Phylogenetic Inference 4/23

The goal of phylogenetic inference is to reconstruct theevolution history (e.g., phylogenetic trees) from molecularsequence data (e.g., DNA, RNA or protein sequences)

Molecular Sequence Data

Species A

Species B

Species C

Species D

Characters

ATGAACAT

ATGCACAC

ATGCATAT

ATGCATGC

Phylogenetic Tree

Lots of modern biological and medical applications: predict theevolution of influenza viruses and help vaccine design, etc.

Example: B Cell Evolution 5/23

This happens inside of you!

These inferences guide rational vaccine design.

Bayesian Phylogenetics 6/23

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

eEvolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

eEvolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

∑∏

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

∑∏

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

∑∏

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

∑∏

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

∑∏

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

∑∏

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

∑∏

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

∑∏

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

Evolution model:

p(ch|pa, qe)

Likelihood

p(Y |τ, q) =

M∏i=1

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Markov chain Monte Carlo 7/23

Random-walk MCMC (MrBayes, BEAST):

I simple random perturbation (e.g., Nearest NeighborhoodInterchange) to generate new state.

Challenges for MCMC

I Large search space: (2n− 5)!! unrooted trees (n taxa)

I Intertwined parameter space, low acceptance rate, hard toscale to data sets with many sequences.

Variational Inference 8/23

q∗(θ)p(θ|x)

Qq∗(θ) = arg min

q∈QKL (q(θ)‖p(θ|x))

I VI turns inference into optimization

I Specify a variational family of distributions over the modelparameters

Q = {qφ(θ);φ ∈ Φ}

I Fit the variational parameters φ to minimize the distance(often in terms of KL divergence) to the exact posterior

Evidence Lower Bound 9/23

L(θ) = Eq(θ)(log p(x, θ))− Eq(θ)(log q(θ)) ≤ log p(x)

I KL is intractable; maximizing the evidence lower bound(ELBO) instead, which only requires the joint probabilityp(x, θ).I The ELBO is a lower bound on log p(x).I Maximizing the ELBO is equivalent to minimizing the KL.

I The ELBO strikes a balance between two termsI The first term encourages q to focus probability mass where

the model puts high probability.I The second term encourages q to be diffuse.

I As an optimization approach, VI tends to be faster thanMCMC, and is easier to scale to large data sets (viastochastic gradient ascent)

Subsplit Bayesian Networks 10/23

Inspired by previous works (Hohna and Drummond 2012,Larget 2013), we can decompose trees into local structures andencode the tree topology space via Bayesian networks!

Probability Estimation Over Tree Topologies 11/23

Rooted Trees

psbn(T = τ) = p(S1 = s1)∏i>1

p(Si = si|Sπi = sπi).

Probability Estimation Over Tree Topologies 11/23

t/unro

3root/unroot

Unrooted Trees:

psbn(T u = τ) =∑s1∼τ

p(S1 = s1)∏i>1

p(Si = si|Sπi = sπi).

Tree Probability Estimation via SBNs 12/23

SBNs can be used to learn a probability distribution based on acollection of trees T = {T1, · · · , TK}.

Tk = {Si = si,k, i ≥ 1}, k = 1, . . . ,K

Rooted Trees

I Maximum Likelihood Estimates: relative frequencies.

pMLE(S1 = s1) =ms1

K, pMLE(Si = si|Sπi = ti) =

msi,ti∑s∈Ci

Unrooted Trees

I Expectation Maximization

pEM,(n+1) = arg maxp

Ep(S1|T,pEM,(n))

(log p(S1) +

∑i>1

log p(Si|Sπi)

Example: Phylogenetic Posterior Estimation 13/23

10 8 10 6 10 4 10 2 100

log(ground truth)10 8

peak 1peak 2

10 8 10 6 10 4 10 2 100

log(ground truth)10 8

SBN-EM

peak 1peak 2

104 105

number of samples

DS1ccdsbn-sasbn-emsbn-em-srf

[Zhang and Matsen, NeurIPS 2018]

I Compared to a previous method CCD (Larget, 2013),SBNs significantly reduce the biases for both highprobability and low probability trees.

I SBNs perform better in the weak data regime.

Example: Phylogenetic Posterior Estimation 14/23

Data set (#Taxa, #Sites)Tree space

sizeSampledtrees

KL divergence to ground truth

SRF CCD SBN-SA SBN-EM SBN-EM-α

DS1 (27, 1949) 5.84×1032 1228 0.0155 0.6027 0.0687 0.0136 0.0130DS2 (29, 2520) 1.58×1035 7 0.0122 0.0218 0.0218 0.0199 0.0128DS3 (36, 1812) 4.89×1047 43 0.3539 0.2074 0.1152 0.1243 0.0882DS4 (41, 1137) 1.01×1057 828 0.5322 0.1952 0.1021 0.0763 0.0637DS5 (50, 378) 2.84×1074 33752 11.5746 1.3272 0.8952 0.8599 0.8218DS6 (50, 1133) 2.84×1074 35407 10.0159 0.4526 0.2613 0.3016 0.2786DS7 (59, 1824) 4.36×1092 1125 1.2765 0.3292 0.2341 0.0483 0.0399DS8 (64, 1008) 1.04×10103 3067 2.1653 0.4149 0.2212 0.1415 0.1236

[Zhang and Matsen, NeurIPS 2018]

Remark: Unlike previous methods, SBNs are flexible enoughto provide accurate approximations to real data posteriors!

Variational Bayesian Phylogenetic Inference 15/23

I Approximating Distribution:

Qφ,ψ(τ, q) ,

tree topology

Qφ(τ)

·branch length

Qψ(q|τ)

I Multi-sample Lower Bound:

LK(φ,ψ) = EQφ,ψ(τ1:K ,q1:K) log

K∑i=1

p(Y |τ i, qi)p(τ i, qi)Qφ(τ i)Qψ(qi|τ i)

)I Use stochastic gradient ascent (SGA) to maximize the

lower bound:

φ, ψ = arg maxφ,ψ

LK(φ,ψ)

I φ: VIMCO/RWSI ψ: The Reparameterization Trick

Qφ,ψ(τ, q) ,tree topology

Qφ(τ) ·branch length

Qψ(q|τ)

K∑i=1

lower bound:

LK(φ,ψ)

Qψ(q|τ)

K∑i=1

I Use stochastic gradient ascent (SGA) to maximize thelower bound:

LK(φ,ψ)

Qψ(q|τ)

K∑i=1

lower bound:

LK(φ,ψ)

Structured Parameterization 16/23

SBNs Parameters

p(S1 = s1) =exp(φs1)∑

sr∈Sr exp(φsr ), p(Si = s|Sπi

= t) =exp(φs|t)∑

s∈S·|t exp(φs|t)

Branch Length Parameters

Qψ(q|τ) =∏

e∈E(τ)

pLognormal (qe | µ(e, τ), σ(e, τ))

I Simple Split

µs(e, τ) = ψµe/τ , σs(e, τ) = ψσe/τ .

I Primary Subsplit Pair (PSP)

µpsp(e, τ) = ψµe/τ +∑

s∈e//τψµs

σpsp(e, τ) = ψσe/τ +∑

s∈e//τψσs .

ψµ (W

ψ µ(Z

1 , Z2 |W,Z)

eψµ(W,Z)

µs(e, τ)

Structured Parameterization 16/23

SBNs Parameters

p(S1 = s1) =exp(φs1)∑

sr∈Sr exp(φsr ), p(Si = s|Sπi

= t) =exp(φs|t)∑

s∈S·|t exp(φs|t)

Branch Length Parameters

Qψ(q|τ) =∏

e∈E(τ)

pLognormal (qe | µ(e, τ), σ(e, τ))

I Simple Split

µs(e, τ) = ψµe/τ , σs(e, τ) = ψσe/τ .

I Primary Subsplit Pair (PSP)

µpsp(e, τ) = ψµe/τ +∑

s∈e//τψµs

σpsp(e, τ) = ψσe/τ +∑

s∈e//τψσs .

ψµ (W

ψ µ(Z

1 , Z2 |W,Z)

eψµ(W,Z)

µpsp(e, τ)

Stochastic Gradient Estimators 17/23

SBNs Parameters φ. With τ j , qjiid∼ Qφ,ψ(τ, q)

I VIMCO. [Minh and Rezende, ICML 2016]

∇φLK(φ,ψ) 'K∑j=1

(LKj|−j(φ,ψ)− wj

)∇φ logQφ(τ j).

I RWS. [Bornschein and Bengio, ICLR 2015]

∇φLK(φ,ψ) 'K∑j=1

wj∇φ logQφ(τ j).

Branch Length Parameters ψ. gψ(ε|τ) = exp(µψ,τ + σψ,τ � ε).

I Reparameterization Trick. Let fφ,ψ(τ, q) = p(Y |τ,q)p(τ,q)Qφ(τ)Qψ(q|τ) .

∇ψLK(φ,ψ) 'K∑j=1

wj∇ψ log fφ,ψ(τ j , gψ(εj |τ j))

where τ jiid∼ Qφ(τ), εj

iid∼ N (0, I).

The VBPI Pipeline 18/23

Qφ(τ)

sample

e.g., ancestralsampling for SBNs

Qψ(q|τ)sample

e.g., Lognormalfor branch lengths

(τ 1, q1)

B(τ 2, q2)

D(τK, qK)

LK(φ,ψ)

multi-samplelower bound

SGA update

Ancestral sampling for SBNs

Qφ(τ)sample

Qψ(q|τ)sample

(τ 1, q1)

B(τ 2, q2)

D(τK, qK)

LK(φ,ψ)

SGA update

Qφ(τ)sample

Qψ(q|τ)

sample

(τ 1, q1)

B(τ 2, q2)

D(τK, qK)

LK(φ,ψ)

SGA update

Qφ(τ)sample

Qψ(q|τ)sample

(τ 1, q1)

B(τ 2, q2)

D(τK, qK)

LK(φ,ψ)

SGA update

Qφ(τ)sample

Qψ(q|τ)sample

(τ 1, q1)

B(τ 2, q2)

D(τK, qK)

LK(φ,ψ)

SGA update

Qφ(τ)sample

Qψ(q|τ)sample

(τ 1, q1)

B(τ 2, q2)

D(τK, qK)

LK(φ,ψ)

SGA update

Performance on Synthetic Data 19/23

A simulated study on unrooted phylogenetic trees with 8 leaves(10395 trees). The target distribution is a random sample fromthe symmetric Dirichlet distribution Dir(β1), β = 0.008

0 50 100 150 200Thousand Iterations

EXACTVIMCO(20)VIMCO(50)RWS(20)RWS(50)

VIMCO(20)VIMCO(50)RWS(20)RWS(50)

10 4 10 3 10 2 10 1

Ground truth10 4

ion VIMCO(50)

RWS(50)

[Zhang and Matsen, ICLR 2019]

ELBOs approach 0 quickly ⇒ SBNs approximations are flexible.

More samples in the multi-sample ELBOs could be helpful.

Performance on Real Data 20/23

VIMCO(10)VIMCO(20)RWS(10)RWS(20)MCMC

VIMCO(10) + PSPVIMCO(20) + PSPRWS(10) + PSPRWS(20) + PSPMCMC

7042 7040 7038 7036GSS

I More samples ⇒ better exploration ⇒ betterapproximation

I More flexible branch length distributions across treetopologies (PSP) ease training and improve approximation

I Outperform MCMC via much more efficient tree spaceexploration and branch length updates

Performance on Real Data 21/23

Data setMarginal Likelihood (NATs)

VIMCO(10) VIMCO(20) VIMCO(10)+PSP VIMCO(20)+PSP SS

DS1 -7108.43(0.26) -7108.35(0.21) -7108.41(0.16) -7108.42(0.10) -7108.42(0.18)DS2 -26367.70(0.12) -26367.71(0.09) -26367.72(0.08) -26367.70(0.10) -26367.57(0.48)DS3 -33735.08(0.11) -33735.11(0.11) -33735.10(0.09) -33735.07(0.11) -33735.44(0.50)DS4 -13329.90(0.31) -13329.98(0.20) -13329.94(0.18) -13329.93(0.22) -13330.06(0.54)DS5 -8214.36(0.67) -8214.74(0.38) -8214.61(0.38) -8214.55(0.43) -8214.51(0.28)DS6 -6723.75(0.68) -6723.71(0.65) -6724.09(0.55) -6724.34(0.45) -6724.07(0.86)DS7 -37332.03(0.43) -37331.90(0.49) -37331.90(0.32) -37332.03(0.23) -37332.76(2.42)DS8 -8653.34(0.55) -8651.54(0.80) -8650.63(0.42) -8650.55(0.46) -8649.88(1.75)

I Competitive to state-of-the-art (stepping-stone),dramatically reducing cost at test time: VBPI(1000) vsSS(100,000)

I PSP alleviates the demand for large samples, reducingcomputation while maintaining approximation accuracy

Conclusion 22/23

I We introduced VBPI, a general variational framework forBayesian phylogenetic inference.

I VBPI allows efficient learning on both tree topology andbranch lengths, providing competitive performance toMCMC while requiring much less computation.

I Can be used for further statistical analysis (e.g., marginallikelihood estimation) via importance sampling.

I There are many extensions, including more flexible branchlength distributions, more general models, designingadaptive transition kernels in MCMC approaches, etc.

References 23/23

I Sebastian Hohna and Alexei J. Drummond. Guided treetopology proposals for Bayesian phylogenetic inference.Syst. Biol., 61(1):1–11, January 2012.

I Bret Larget. The estimation of tree posterior probabilitiesusing conditional clade probability distributions. Syst.Biol., 62(4):501–511, July 2013.

I Zhang, C. and Matsen F. A., Generalizing Tree ProbabilityEstimation via Bayesian Networks. In Advances in NeuralInformation Processing Systems, 2018.

I Zhang, C. and Matsen F. A., Variational BayesianPhylogenetic Inference. In Proceedings of the 7thInternational Conference on Learning Representations,2019.

Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in...

Documents

Online computational ethology based on modern IT

Lecture 2: Computational Semantics

Computational Finance Introductory Lecture

Linear Programming Computational Geometry Lecture

Computational Linguistics Lecture: Lexical Semanticsdisi.unitn.it/~bernardi/Courses/CompLing/Slides_10_11/28_10_10.pdf · Computational Linguistics Lecture: Lexical Semantics

Computational Geometry Seminar Lecture 1

MODERN ADVANCES IN COMPUTATIONAL IMAGING AT MICROWAVE …

Lecture 37 Computational Electromagnetics, Numerical …

Modern Computational Statistics [1em] Lecture 1: Introduction · Lecture 1: Introduction Cheng Zhang School of Mathematical Sciences, Peking University September 9, 2019. General

Computational Genomics Lecture #2

Computational Solid Mechanics Lecture Notes

Modern Computational Statistics [1em] Lecture 10: Expectation … · 2020-05-27 · Modern Computational Statistics Lecture 10: Expectation Maximization Cheng Zhang School of Mathematical

Modern computational approaches for analysing molecular ...doerge/BIOINFORM.D/FALL07/MarjoramTavar… · Modern computational approaches for analysing molecular genetic variation

MODERN COMPUTATIONAL APPROACHES TO ...pernerscontacts.upce.cz/20_2010/Novotny.pdfNumber IV, Volume V, December 2010 Novotný, Píštěk, Ambróz, Drápal: Modern computational approaches

Modern Computational Statistics [1em] Lecture 6&7: Markov ... · Modern Computational Statistics Lecture 6&7: Markov Chain Monte Carlo Cheng Zhang School of Mathematical Sciences,

Lecture Notes for Computational Methods in Chemical ...scp/scp/ocw/chemical/cl 701 computati… · Lecture Notes for Computational Methods in Chemical Engineering (CL 701) ... A modern

Computational Genomics Lecture #3a

Computational Integer Programming Lecture 10: Branchingcoral.ie.lehigh.edu/~ted/files/computational-mip/... · 2016-06-23 · Computational MILP Lecture 10 3 Branching We have now

Computational Genomics Lecture #3

Summary. Modern Bayesian inference is highly computational ... · Summary. Modern Bayesian inference is highly computational but commonly pro-ceeds without reference to modern developments