107
Big Data need Big Model 1/44

Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

Big Data need Big Model

1/44

Page 2: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

2/44

Stan: A platform for Bayesian inference

Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee,Ben Goodrich, Michael Betancourt, Marcus Brubaker,

Jiqiang Guo, Peter Li, Allen Riddell,. . .

Department of Statistics, Columbia University, New York(and other places)

10 Nov 2014

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 3: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

3/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 4: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

4/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 5: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

5/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 6: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

6/44

Ordered probit

data {int<lower=2> K;int<lower=0> N;int<lower=1> D;int<lower=1,upper=K> y[N];row_vector[D] x[N]; }

parameters {vector[D] beta;ordered[K-1] c; }

model {vector[K] theta;for (n in 1:N) {

real eta;eta <- x[n] * beta;theta[1] <- 1 - Phi(eta - c[1]);for (k in 2:(K-1))

theta[k] <- Phi(eta - c[k-1]) - Phi(eta - c[k]);theta[K] <- Phi(eta - c[K-1]);y[n] ~ categorical(theta);

} }Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 7: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

7/44

Measurement error model

data {...real x_meas[N]; // measurement of xreal<lower=0> tau; // measurement noise

}parameters {

real x[N]; // unknown true valuereal mu_x; // prior locationreal sigma_x; // prior scale...

}model {

x ~ normal(mu_x, sigma_x); // priorx_meas ~ normal(x, tau); // measurement modely ~ normal(alpha + beta * x, sigma);...

}

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 8: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

8/44

Stan overview

I Fit open-ended Bayesian modelsI Specify log posterior density in C++I Code a distribution once, then use it everywhereI Hamiltonian No-U-Turn samplerI AutodiffI Runs from R, Python, Matlab, Julia; postprocessing

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 9: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

8/44

Stan overview

I Fit open-ended Bayesian models

I Specify log posterior density in C++I Code a distribution once, then use it everywhereI Hamiltonian No-U-Turn samplerI AutodiffI Runs from R, Python, Matlab, Julia; postprocessing

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 10: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

8/44

Stan overview

I Fit open-ended Bayesian modelsI Specify log posterior density in C++

I Code a distribution once, then use it everywhereI Hamiltonian No-U-Turn samplerI AutodiffI Runs from R, Python, Matlab, Julia; postprocessing

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 11: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

8/44

Stan overview

I Fit open-ended Bayesian modelsI Specify log posterior density in C++I Code a distribution once, then use it everywhere

I Hamiltonian No-U-Turn samplerI AutodiffI Runs from R, Python, Matlab, Julia; postprocessing

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 12: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

8/44

Stan overview

I Fit open-ended Bayesian modelsI Specify log posterior density in C++I Code a distribution once, then use it everywhereI Hamiltonian No-U-Turn sampler

I AutodiffI Runs from R, Python, Matlab, Julia; postprocessing

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 13: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

8/44

Stan overview

I Fit open-ended Bayesian modelsI Specify log posterior density in C++I Code a distribution once, then use it everywhereI Hamiltonian No-U-Turn samplerI Autodiff

I Runs from R, Python, Matlab, Julia; postprocessing

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 14: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

8/44

Stan overview

I Fit open-ended Bayesian modelsI Specify log posterior density in C++I Code a distribution once, then use it everywhereI Hamiltonian No-U-Turn samplerI AutodiffI Runs from R, Python, Matlab, Julia; postprocessing

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 15: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

9/44

People

I Stan core (15)I Research collaborators (30)I Developers (100)I User community (1000)I Users (10000)

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 16: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

9/44

People

I Stan core (15)

I Research collaborators (30)I Developers (100)I User community (1000)I Users (10000)

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 17: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

9/44

People

I Stan core (15)I Research collaborators (30)I Developers (100)

I User community (1000)I Users (10000)

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 18: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

9/44

People

I Stan core (15)I Research collaborators (30)I Developers (100)I User community (1000)

I Users (10000)

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 19: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

9/44

People

I Stan core (15)I Research collaborators (30)I Developers (100)I User community (1000)I Users (10000)

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 20: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

10/44

Funding

I National Science FoundationI Institute for Education SciencesI Department of EnergyI NovartisI YouGov

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 21: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

11/44

Roles of Stan

I Bayesian inference for unsophisticated users (alternative toStata, Bugs, etc.)

I Bayesian inference for sophisticated users (alternative toprogramming it yourself)

I Fast and scalable gradient computationI Environment for developing new algorithms

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 22: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

11/44

Roles of Stan

I Bayesian inference for unsophisticated users (alternative toStata, Bugs, etc.)

I Bayesian inference for sophisticated users (alternative toprogramming it yourself)

I Fast and scalable gradient computationI Environment for developing new algorithms

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 23: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

11/44

Roles of Stan

I Bayesian inference for unsophisticated users (alternative toStata, Bugs, etc.)

I Bayesian inference for sophisticated users (alternative toprogramming it yourself)

I Fast and scalable gradient computationI Environment for developing new algorithms

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 24: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

11/44

Roles of Stan

I Bayesian inference for unsophisticated users (alternative toStata, Bugs, etc.)

I Bayesian inference for sophisticated users (alternative toprogramming it yourself)

I Fast and scalable gradient computation

I Environment for developing new algorithms

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 25: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

11/44

Roles of Stan

I Bayesian inference for unsophisticated users (alternative toStata, Bugs, etc.)

I Bayesian inference for sophisticated users (alternative toprogramming it yourself)

I Fast and scalable gradient computationI Environment for developing new algorithms

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 26: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

12/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 27: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

13/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 28: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

14/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 29: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

15/44

“This week, the New York Times and CBS News published a storyusing, in part, information from a non-probability, opt-in surveysparking concern among many in the polling community. In general,these methods have little grounding in theory and the results canvary widely based on the particular method used.”— Michael Link,President, American Association for Public Opinion Research

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 30: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

16/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 31: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

17/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 32: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

18/44

Xbox estimates, adjusting for demographics

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 33: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

19/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 34: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

20/44

I Karl Rove, Wall Street Journal, 7 Oct: “Mr. Romney’s bounceis significant.”

I Nate Silver, New York Times, 6 Oct: “Mr. Romney has notonly improved his own standing but also taken voters awayfrom Mr. Obama’s column.”

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 35: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

21/44

Xbox estimates, adjusting for demographics and partisanship

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 36: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

22/44

Jimmy Carter Republicans and George W. Bush Democrats

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 37: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

23/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 38: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

24/44

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 39: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

25/44

Toxicology

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 40: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

26/44

Earth science

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 41: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

27/44

Lots of other applications

Astronomy, ecology, linguistics, epidemiology, soil science, . . .

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 42: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

28/44

Steps of Bayesian data analysis

I Model buildingI InferenceI Model checkingI Model understanding and improvement

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 43: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

28/44

Steps of Bayesian data analysis

I Model building

I InferenceI Model checkingI Model understanding and improvement

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 44: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

28/44

Steps of Bayesian data analysis

I Model buildingI Inference

I Model checkingI Model understanding and improvement

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 45: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

28/44

Steps of Bayesian data analysis

I Model buildingI InferenceI Model checking

I Model understanding and improvement

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 46: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

28/44

Steps of Bayesian data analysis

I Model buildingI InferenceI Model checkingI Model understanding and improvement

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 47: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

29/44

Background on Bayesian computation

I Point estimates and standard errorsI Hierarchical modelsI Posterior simulationI Markov chain Monte Carlo (Gibbs sampler and Metropolis

algorithm)I Hamiltonian Monte Carlo

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 48: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

29/44

Background on Bayesian computation

I Point estimates and standard errors

I Hierarchical modelsI Posterior simulationI Markov chain Monte Carlo (Gibbs sampler and Metropolis

algorithm)I Hamiltonian Monte Carlo

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 49: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

29/44

Background on Bayesian computation

I Point estimates and standard errorsI Hierarchical models

I Posterior simulationI Markov chain Monte Carlo (Gibbs sampler and Metropolis

algorithm)I Hamiltonian Monte Carlo

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 50: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

29/44

Background on Bayesian computation

I Point estimates and standard errorsI Hierarchical modelsI Posterior simulation

I Markov chain Monte Carlo (Gibbs sampler and Metropolisalgorithm)

I Hamiltonian Monte Carlo

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 51: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

29/44

Background on Bayesian computation

I Point estimates and standard errorsI Hierarchical modelsI Posterior simulationI Markov chain Monte Carlo (Gibbs sampler and Metropolis

algorithm)

I Hamiltonian Monte Carlo

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 52: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

29/44

Background on Bayesian computation

I Point estimates and standard errorsI Hierarchical modelsI Posterior simulationI Markov chain Monte Carlo (Gibbs sampler and Metropolis

algorithm)I Hamiltonian Monte Carlo

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 53: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 54: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specific

I Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 55: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 56: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scale

I Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 57: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 58: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMC

I Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 59: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 60: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensible

I Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 61: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 62: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functions

I Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 63: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

30/44

Solving problems

I Problem: Gibbs too slow, Metropolis too problem-specificI Solution: Hamiltonian Monte Carlo

I Problem: Interpreters too slow, won’t scaleI Solution: Compilation

I Problem: Need gradients of log posterior for HMCI Solution: Reverse-mode algorithmic differentation

I Problem: Existing algo-diff slow, limited, unextensibleI Solution: Our own algo-diff

I Problem: Algo-diff requires fully templated functionsI Solution: Our own density library, Eigen linear algebra

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 64: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

31/44

Radford Neal (2011) on Hamiltonian Monte Carlo

“One practical impediment to the use of Hamiltonian Monte Carlois the need to select suitable values for the leapfrog stepsize, ε, andthe number of leapfrog steps L . . . Tuning HMC will usually requirepreliminary runs with trial values for ε and L . . . Unfortunately,preliminary runs can be misleading . . . ”

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 65: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

32/44

The No U-Turn Sampler

I Created by Matt HoffmanI Run the HMC steps until they start to turn around

(bend with an angle > 180◦)I Computationally efficientI Requires no tuning of #stepsI Complications to preserve detailed balance

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 66: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

33/44

NUTS Example TrajectoryHoffman and Gelman

−0.1 0 0.1 0.2 0.3 0.4 0.5−0.1

0

0.1

0.2

0.3

0.4

Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipseis a contour of the target distribution, the black open circles are the positions θtraced out by the leapfrog integrator and associated with elements of the set ofvisited states B, the black solid circle is the starting position, the red solid circlesare positions associated with states that must be excluded from the set C ofpossible next samples because their joint probability is below the slice variable u,and the positions with a red “x” through them correspond to states that must beexcluded from C to satisfy detailed balance. The blue arrow is the vector from thepositions associated with the leftmost to the rightmost leaf nodes in the rightmostheight-3 subtree, and the magenta arrow is the (normalized) momentum vectorat the final state in the trajectory. The doubling process stops here, since theblue and magenta arrows make an angle of more than 90 degrees. The crossed-out nodes with a red “x” are in the right half-tree, and must be ignored whenchoosing the next sample.

being more complicated, the analogous algorithm that eliminates the slice variable seemsempirically to be slightly less efficient than the algorithm presented in this paper.

6

I Blue ellipse is contour of target distributionI Initial position at black solid circleI Arrows indicate a U-turn in momentum

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 67: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

33/44

NUTS Example TrajectoryHoffman and Gelman

−0.1 0 0.1 0.2 0.3 0.4 0.5−0.1

0

0.1

0.2

0.3

0.4

Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipseis a contour of the target distribution, the black open circles are the positions θtraced out by the leapfrog integrator and associated with elements of the set ofvisited states B, the black solid circle is the starting position, the red solid circlesare positions associated with states that must be excluded from the set C ofpossible next samples because their joint probability is below the slice variable u,and the positions with a red “x” through them correspond to states that must beexcluded from C to satisfy detailed balance. The blue arrow is the vector from thepositions associated with the leftmost to the rightmost leaf nodes in the rightmostheight-3 subtree, and the magenta arrow is the (normalized) momentum vectorat the final state in the trajectory. The doubling process stops here, since theblue and magenta arrows make an angle of more than 90 degrees. The crossed-out nodes with a red “x” are in the right half-tree, and must be ignored whenchoosing the next sample.

being more complicated, the analogous algorithm that eliminates the slice variable seemsempirically to be slightly less efficient than the algorithm presented in this paper.

6

I Blue ellipse is contour of target distribution

I Initial position at black solid circleI Arrows indicate a U-turn in momentum

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 68: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

33/44

NUTS Example TrajectoryHoffman and Gelman

−0.1 0 0.1 0.2 0.3 0.4 0.5−0.1

0

0.1

0.2

0.3

0.4

Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipseis a contour of the target distribution, the black open circles are the positions θtraced out by the leapfrog integrator and associated with elements of the set ofvisited states B, the black solid circle is the starting position, the red solid circlesare positions associated with states that must be excluded from the set C ofpossible next samples because their joint probability is below the slice variable u,and the positions with a red “x” through them correspond to states that must beexcluded from C to satisfy detailed balance. The blue arrow is the vector from thepositions associated with the leftmost to the rightmost leaf nodes in the rightmostheight-3 subtree, and the magenta arrow is the (normalized) momentum vectorat the final state in the trajectory. The doubling process stops here, since theblue and magenta arrows make an angle of more than 90 degrees. The crossed-out nodes with a red “x” are in the right half-tree, and must be ignored whenchoosing the next sample.

being more complicated, the analogous algorithm that eliminates the slice variable seemsempirically to be slightly less efficient than the algorithm presented in this paper.

6

I Blue ellipse is contour of target distributionI Initial position at black solid circle

I Arrows indicate a U-turn in momentum

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 69: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

33/44

NUTS Example TrajectoryHoffman and Gelman

−0.1 0 0.1 0.2 0.3 0.4 0.5−0.1

0

0.1

0.2

0.3

0.4

Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipseis a contour of the target distribution, the black open circles are the positions θtraced out by the leapfrog integrator and associated with elements of the set ofvisited states B, the black solid circle is the starting position, the red solid circlesare positions associated with states that must be excluded from the set C ofpossible next samples because their joint probability is below the slice variable u,and the positions with a red “x” through them correspond to states that must beexcluded from C to satisfy detailed balance. The blue arrow is the vector from thepositions associated with the leftmost to the rightmost leaf nodes in the rightmostheight-3 subtree, and the magenta arrow is the (normalized) momentum vectorat the final state in the trajectory. The doubling process stops here, since theblue and magenta arrows make an angle of more than 90 degrees. The crossed-out nodes with a red “x” are in the right half-tree, and must be ignored whenchoosing the next sample.

being more complicated, the analogous algorithm that eliminates the slice variable seemsempirically to be slightly less efficient than the algorithm presented in this paper.

6

I Blue ellipse is contour of target distributionI Initial position at black solid circleI Arrows indicate a U-turn in momentum

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 70: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

34/44

NUTS vs. Gibbs and MetropolisThe No-U-Turn Sampler

Figure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots

compare 1,000 independent draws from a highly correlated 250-dimensional distribu-

tion (right) with 1,000,000 samples (thinned to 1,000 samples for display) generated by

random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display)

generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS

(second from right). Only the first two dimensions are shown here.

4.4 Comparing the Efficiency of HMC and NUTS

Figure 6 compares the efficiency of HMC (with various simulation lengths λ ≈ �L) andNUTS (which chooses simulation lengths automatically). The x-axis in each plot is thetarget δ used by the dual averaging algorithm from section 3.2 to automatically tune the stepsize �. The y-axis is the effective sample size (ESS) generated by each sampler, normalized bythe number of gradient evaluations used in generating the samples. HMC’s best performanceseems to occur around δ = 0.65, suggesting that this is indeed a reasonable default valuefor a variety of problems. NUTS’s best performance seems to occur around δ = 0.6, butdoes not seem to depend strongly on δ within the range δ ∈ [0.45, 0.65]. δ = 0.6 thereforeseems like a reasonable default value for NUTS.

On the two logistic regression problems NUTS is able to produce effectively indepen-dent samples about as efficiently as HMC can. On the multivariate normal and stochasticvolatility problems, NUTS with δ = 0.6 outperforms HMC’s best ESS by about a factor ofthree.

As expected, HMC’s performance degrades if an inappropriate simulation length is cho-sen. Across the four target distributions we tested, the best simulation lengths λ for HMCvaried by about a factor of 100, with the longest optimal λ being 17.62 (for the multivari-ate normal) and the shortest optimal λ being 0.17 (for the simple logistic regression). Inpractice, finding a good simulation length for HMC will usually require some number ofpreliminary runs. The results in Figure 6 suggest that NUTS can generate samples at leastas efficiently as HMC, even discounting the cost of any preliminary runs needed to tuneHMC’s simulation length.

25

I Two dimensions of highly correlated 250-dim distributionI 1M samples from Metropolis, 1M from Gibbs (thin to 1K)I 1K samples from NUTS, 1K independent draws

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 71: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

34/44

NUTS vs. Gibbs and MetropolisThe No-U-Turn Sampler

Figure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots

compare 1,000 independent draws from a highly correlated 250-dimensional distribu-

tion (right) with 1,000,000 samples (thinned to 1,000 samples for display) generated by

random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display)

generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS

(second from right). Only the first two dimensions are shown here.

4.4 Comparing the Efficiency of HMC and NUTS

Figure 6 compares the efficiency of HMC (with various simulation lengths λ ≈ �L) andNUTS (which chooses simulation lengths automatically). The x-axis in each plot is thetarget δ used by the dual averaging algorithm from section 3.2 to automatically tune the stepsize �. The y-axis is the effective sample size (ESS) generated by each sampler, normalized bythe number of gradient evaluations used in generating the samples. HMC’s best performanceseems to occur around δ = 0.65, suggesting that this is indeed a reasonable default valuefor a variety of problems. NUTS’s best performance seems to occur around δ = 0.6, butdoes not seem to depend strongly on δ within the range δ ∈ [0.45, 0.65]. δ = 0.6 thereforeseems like a reasonable default value for NUTS.

On the two logistic regression problems NUTS is able to produce effectively indepen-dent samples about as efficiently as HMC can. On the multivariate normal and stochasticvolatility problems, NUTS with δ = 0.6 outperforms HMC’s best ESS by about a factor ofthree.

As expected, HMC’s performance degrades if an inappropriate simulation length is cho-sen. Across the four target distributions we tested, the best simulation lengths λ for HMCvaried by about a factor of 100, with the longest optimal λ being 17.62 (for the multivari-ate normal) and the shortest optimal λ being 0.17 (for the simple logistic regression). Inpractice, finding a good simulation length for HMC will usually require some number ofpreliminary runs. The results in Figure 6 suggest that NUTS can generate samples at leastas efficiently as HMC, even discounting the cost of any preliminary runs needed to tuneHMC’s simulation length.

25

I Two dimensions of highly correlated 250-dim distribution

I 1M samples from Metropolis, 1M from Gibbs (thin to 1K)I 1K samples from NUTS, 1K independent draws

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 72: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

34/44

NUTS vs. Gibbs and MetropolisThe No-U-Turn Sampler

Figure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots

compare 1,000 independent draws from a highly correlated 250-dimensional distribu-

tion (right) with 1,000,000 samples (thinned to 1,000 samples for display) generated by

random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display)

generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS

(second from right). Only the first two dimensions are shown here.

4.4 Comparing the Efficiency of HMC and NUTS

Figure 6 compares the efficiency of HMC (with various simulation lengths λ ≈ �L) andNUTS (which chooses simulation lengths automatically). The x-axis in each plot is thetarget δ used by the dual averaging algorithm from section 3.2 to automatically tune the stepsize �. The y-axis is the effective sample size (ESS) generated by each sampler, normalized bythe number of gradient evaluations used in generating the samples. HMC’s best performanceseems to occur around δ = 0.65, suggesting that this is indeed a reasonable default valuefor a variety of problems. NUTS’s best performance seems to occur around δ = 0.6, butdoes not seem to depend strongly on δ within the range δ ∈ [0.45, 0.65]. δ = 0.6 thereforeseems like a reasonable default value for NUTS.

On the two logistic regression problems NUTS is able to produce effectively indepen-dent samples about as efficiently as HMC can. On the multivariate normal and stochasticvolatility problems, NUTS with δ = 0.6 outperforms HMC’s best ESS by about a factor ofthree.

As expected, HMC’s performance degrades if an inappropriate simulation length is cho-sen. Across the four target distributions we tested, the best simulation lengths λ for HMCvaried by about a factor of 100, with the longest optimal λ being 17.62 (for the multivari-ate normal) and the shortest optimal λ being 0.17 (for the simple logistic regression). Inpractice, finding a good simulation length for HMC will usually require some number ofpreliminary runs. The results in Figure 6 suggest that NUTS can generate samples at leastas efficiently as HMC, even discounting the cost of any preliminary runs needed to tuneHMC’s simulation length.

25

I Two dimensions of highly correlated 250-dim distributionI 1M samples from Metropolis, 1M from Gibbs (thin to 1K)

I 1K samples from NUTS, 1K independent draws

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 73: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

34/44

NUTS vs. Gibbs and MetropolisThe No-U-Turn Sampler

Figure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots

compare 1,000 independent draws from a highly correlated 250-dimensional distribu-

tion (right) with 1,000,000 samples (thinned to 1,000 samples for display) generated by

random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display)

generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS

(second from right). Only the first two dimensions are shown here.

4.4 Comparing the Efficiency of HMC and NUTS

Figure 6 compares the efficiency of HMC (with various simulation lengths λ ≈ �L) andNUTS (which chooses simulation lengths automatically). The x-axis in each plot is thetarget δ used by the dual averaging algorithm from section 3.2 to automatically tune the stepsize �. The y-axis is the effective sample size (ESS) generated by each sampler, normalized bythe number of gradient evaluations used in generating the samples. HMC’s best performanceseems to occur around δ = 0.65, suggesting that this is indeed a reasonable default valuefor a variety of problems. NUTS’s best performance seems to occur around δ = 0.6, butdoes not seem to depend strongly on δ within the range δ ∈ [0.45, 0.65]. δ = 0.6 thereforeseems like a reasonable default value for NUTS.

On the two logistic regression problems NUTS is able to produce effectively indepen-dent samples about as efficiently as HMC can. On the multivariate normal and stochasticvolatility problems, NUTS with δ = 0.6 outperforms HMC’s best ESS by about a factor ofthree.

As expected, HMC’s performance degrades if an inappropriate simulation length is cho-sen. Across the four target distributions we tested, the best simulation lengths λ for HMCvaried by about a factor of 100, with the longest optimal λ being 17.62 (for the multivari-ate normal) and the shortest optimal λ being 0.17 (for the simple logistic regression). Inpractice, finding a good simulation length for HMC will usually require some number ofpreliminary runs. The results in Figure 6 suggest that NUTS can generate samples at leastas efficiently as HMC, even discounting the cost of any preliminary runs needed to tuneHMC’s simulation length.

25

I Two dimensions of highly correlated 250-dim distributionI 1M samples from Metropolis, 1M from Gibbs (thin to 1K)I 1K samples from NUTS, 1K independent draws

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 74: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

35/44

NUTS vs. Basic HMC

I 250-D normal and logistic regression modelsI Vertical axis shows effective #sims (big is good)I (Left) NUTS; (Right) HMC with increasing t = εL

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 75: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

36/44

NUTS vs. Basic HMC II

I Hierarchical logistic regression and stochastic volatilityI Simulation time is step size ε times #steps LI NUTS can beat optimally tuned HMC

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 76: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 77: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGS

I Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 78: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 79: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMC

I Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 80: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 81: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcs

I Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 82: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 83: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recovery

I Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 84: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 85: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behavior

I Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 86: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 87: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 88: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)

I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 89: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

37/44

Solving more problems in Stan

I Problem: Need ease of use of BUGSI Solution: Compile directed graphical model language

I Problem: Need to tune parameters for HMCI Solution: Auto tuning, adaptation

I Problem: Efficient up-to-proportion density calcsI Solution: Density template metaprogramming

I Problem: Limited error checking, recoveryI Solution: Static model typing, informative exceptions

I Problem: Poor boundary behaviorI Solution: Calculate limits (e.g. limx→0 x log x)

I Problem: Restrictive licensing (e.g., closed, GPL, etc.)I Solution: Open-source, BSD license

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 90: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

38/44

New stuff: Differential equation models

Simple harmonic oscillator:

dz1

dt= −z2

dz2

dt= −z1 − θz2

with observations (y1, y2)t , t = 1, . . . ,T :

y1t ∼ N(z1(t), σ21)

y2t ∼ N(z2(t), σ22)

Given data (y1, y2)t , t = 1, . . . ,T ,estimate initial state (y1, y2)t=0 and parameter θ

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 91: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

39/44

Stan program: 1

functions {real[] sho(real t, real[] y, real[] theta, real[] x_r, int[] x_i) {

real dydt[2];dydt[1] <- y[2];dydt[2] <- -y[1] - theta[1] * y[2];return dydt;

}}data {

int<lower=1> T;real y[T,2];real t0;real ts[T];

}transformed data {

real x_r[0];int x_i[0];

}

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 92: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

40/44

Stan program: 2

parameters {real y0[2];vector<lower=0>[2] sigma;real theta[1];

}model {

real z[T,2];sigma ~ cauchy(0,2.5);theta ~ normal(0,1);y0 ~ normal(0,1);z <- integrate_ode(sho, y0, t0, ts, theta, x_r, x_i);for (t in 1:T)

y[t] ~ normal(z[t], sigma);}

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 93: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

41/44

Stan output

Run RStan with data simulated fromθ = 0.15, y0 = (1, 0), and σ = 0.1:

Inference for Stan model: sho.4 chains, each with iter=2000; warmup=1000; thin=1;post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhaty0[1] 1.05 0.00 0.09 0.87 0.98 1.05 1.10 1.23 1172 1y0[2] -0.06 0.00 0.06 -0.18 -0.10 -0.06 -0.02 0.06 1524 1sigma[1] 0.14 0.00 0.04 0.08 0.11 0.13 0.16 0.25 1354 1sigma[2] 0.11 0.00 0.03 0.06 0.08 0.10 0.12 0.18 1697 1theta[1] 0.15 0.00 0.04 0.08 0.13 0.15 0.17 0.22 1112 1lp__ 28.97 0.06 1.80 24.55 27.95 29.37 30.29 31.35 992 1

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 94: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

42/44

Big Data, Big Model, Scalable Computing

10-3

10-2

10-1

100

102

103

104

105

106

Sec

onds

/ S

amp

le

Total Ratings

10 Items

100 Items

1000 Items

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 95: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

43/44

Thinking about scalability

I Hierarchical item response model:

Stan JAGS# items # raters # groups # data time memory time memory

20 2,000 100 40,000 :02m 16MB :03m 220MB40 8,000 200 320,000 :16m 92MB :40m 1400MB80 32,000 400 2,560,000 4h:10m 580MB :??m ?MB

I Also, Stan generated 4x effective sample size per iteration

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 96: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

43/44

Thinking about scalability

I Hierarchical item response model:

Stan JAGS# items # raters # groups # data time memory time memory

20 2,000 100 40,000 :02m 16MB :03m 220MB40 8,000 200 320,000 :16m 92MB :40m 1400MB80 32,000 400 2,560,000 4h:10m 580MB :??m ?MB

I Also, Stan generated 4x effective sample size per iteration

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 97: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

43/44

Thinking about scalability

I Hierarchical item response model:

Stan JAGS# items # raters # groups # data time memory time memory

20 2,000 100 40,000 :02m 16MB :03m 220MB40 8,000 200 320,000 :16m 92MB :40m 1400MB80 32,000 400 2,560,000 4h:10m 580MB :??m ?MB

I Also, Stan generated 4x effective sample size per iteration

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 98: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I Programming

I Faster gradients and higher-order derivativesI Functions

I Statistical algorithms

I Riemannian Hamiltonian Monte CarloI (Penalized) mleI (Penalized) marginal mleI Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 99: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I Programming

I Faster gradients and higher-order derivativesI Functions

I Statistical algorithms

I Riemannian Hamiltonian Monte CarloI (Penalized) mleI (Penalized) marginal mleI Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 100: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I ProgrammingI Faster gradients and higher-order derivatives

I FunctionsI Statistical algorithms

I Riemannian Hamiltonian Monte CarloI (Penalized) mleI (Penalized) marginal mleI Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 101: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I ProgrammingI Faster gradients and higher-order derivativesI Functions

I Statistical algorithms

I Riemannian Hamiltonian Monte CarloI (Penalized) mleI (Penalized) marginal mleI Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 102: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I ProgrammingI Faster gradients and higher-order derivativesI Functions

I Statistical algorithms

I Riemannian Hamiltonian Monte CarloI (Penalized) mleI (Penalized) marginal mleI Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 103: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I ProgrammingI Faster gradients and higher-order derivativesI Functions

I Statistical algorithmsI Riemannian Hamiltonian Monte Carlo

I (Penalized) mleI (Penalized) marginal mleI Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 104: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I ProgrammingI Faster gradients and higher-order derivativesI Functions

I Statistical algorithmsI Riemannian Hamiltonian Monte CarloI (Penalized) mle

I (Penalized) marginal mleI Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 105: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I ProgrammingI Faster gradients and higher-order derivativesI Functions

I Statistical algorithmsI Riemannian Hamiltonian Monte CarloI (Penalized) mleI (Penalized) marginal mle

I Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 106: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I ProgrammingI Faster gradients and higher-order derivativesI Functions

I Statistical algorithmsI Riemannian Hamiltonian Monte CarloI (Penalized) mleI (Penalized) marginal mleI Black-box variational Bayes

I Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference

Page 107: Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

44/44

Future work

I ProgrammingI Faster gradients and higher-order derivativesI Functions

I Statistical algorithmsI Riemannian Hamiltonian Monte CarloI (Penalized) mleI (Penalized) marginal mleI Black-box variational BayesI Data partitioning and expectation propagation

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . . Stan: A platform for Bayesian inference