63
Probabilistic Programming; Ways Forward Frank Wood

Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Probabilistic Programming; Ways Forward

Frank Wood

Page 2: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Outline• What is probabilistic programming?

• What are the goals of the field?

• What are some challenges?

• Where are we now?

• Ways forward…

Page 3: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

What is probabilistic programming?

Page 4: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

An Emerging Field

ML: Algorithms &Applications

STATS: Inference &

Theory

PL: Compilers,Semantics,

Analysis

ProbabilisticProgramming

Page 5: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Conceptualization

Parameters

Program

Output

CS

Parameters

Program

Observations

Probabilistic Programming Statistics

p(✓|x)

p(x|✓)p(✓)

x

Page 6: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Operative Definition“Probabilistic programs are usual functional or imperative programs with two added constructs:

(1) the ability to draw values at random from distributions, and

(2) the ability to condition values of variables in a program via observations.”

Gordon et al, 2014

Page 7: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

What are the goals of probabilistic

programming?

Page 8: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Increased Productivity

(fn [x] (logb 1.04 (+ 1 x)))

Lines of Matlab/Java Code

Line

s of

Ang

lican

Cod

e

HPYP, [Wood 2007]

DDPMO, [Neiswanger et al 2014]

PDIA, [Pfau 2010]

Collapsed LDA

DP Conjugate Mixture

log lin

p(⋅|d

ata)

Page 9: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Automatic Inference

Programming Language Representation / Abstraction Layer

Inference Engine(s)

Models

CARON ET AL.

This lack of consistency is shared by other models based on the Polya urn construction (Zhuet al., 2005; Ahmed and Xing, 2008; Blei and Frazier, 2011). Blei and Frazier (2011) provide adetailed discussion on this issue and describe cases where one should or should not bother about it.

It is possible to define a slightly modified version of our model that is consistent under marginal-isation, at the expense of an additional set of latent variables. This is described in Appendix C.

3.2 Stationary Models for Cluster Locations

To ensure we obtain a first-order stationary Pitman-Yor process mixture model, we also need tosatisfy (B). This can be easily achieved if for k 2 I(mt

t)

Uk,t ⇠⇢

p (·|Uk,t�1) if k 2 I(mtt�1)

H otherwise

where H is the invariant distribution of the Markov transition kernel p (·|·). In the time seriesliterature, many approaches are available to build such transition kernels based on copulas (Joe,1997) or Gibbs sampling techniques (Pitt and Walker, 2005).

Combining the stationary Pitman-Yor and cluster locations models, we can summarize the fullmodel by the following Bayesian network in Figure 1. It can also be summarized using a Chineserestaurant metaphor (see Figure 2).

Figure 1: A representation of the time-varying Pitman-Yor process mixture as a directed graphi-cal model, representing conditional independencies between variables. All assignmentvariables and observations at time t are denoted ct and zt, respectively.

3.3 Properties of the Models

Under the uniform deletion model, the number At =P

imti,t�1 of alive allocation variables at time

t can be written as

At =

t�1X

j=1

nX

k=1

Xj,k

8

c0 Hr

� �m

c1 ⇡m

Hy ✓m

r0

s0

r1

s1

y1

r2

s2

y2

rT

sT

yT

r3

s3

y31

Gaussian Mixture Model

¼

µc

yi

k

k

i

N

K

K

α

θc

yi

k

k o

i

N

K

K

α

θc

yi

k

k o

i

N

1

1

Figure : From left to right: graphical models for a finite Gaussian mixture model(GMM), a Bayesian GMM, and an infinite GMM

ci |~⇡ ⇠ Discrete(~⇡)

~

yi |ci = k ;⇥ ⇠ Gaussian(·|✓k).

~⇡|↵ ⇠ Dirichlet(·| ↵K

, . . . ,

K

)

⇥ ⇠ G0

Wood (University of Oxford) Unsupervised Machine Learning January, 2014 16 / 19

Latent Dirichlet Allocation

w

diz

di �k �

d = 1 . . . D

i = 1 . . . N

d.

✓d

k = 1 . . . K

Figure 1. Graphical model for LDA model

Lecture LDA

LDA is a hierarchical model used to model text documents. Each document is modeled as

a mixture of topics. Each topic is defined as a distribution over the words in the vocabulary.

Here, we will denote by K the number of topics in the model. We use D to indicate the

number of documents, M to denote the number of words in the vocabulary, and N

d. to

denote the number of words in document d. We will assume that the words have been

translated to the set of integers {1, . . . , M} through the use of a static dictionary. This is

for convenience only and the integer mapping will contain no semantic information. The

generative model for the D documents can be thought of as sequentially drawing a topic

mixture ✓d for each document independently from a DirK(↵

~

1) distribution, where DirK(

~

�)

is a Dirichlet distribution over the K-dimensional simplex with parameters [�1, �2, . . . , �K ].

Each of K topics {�k}Kk=1 are drawn independently from DirM (�

~

1). Then, for each of the

i = 1 . . . N

d. words in document d, an assignment variable z

di is drawn from Mult(✓

d).

Conditional on the assignment variable z

di , word i in document d, denoted as w

di , is drawn

independently from Mult(�zdi). The graphical model for the process can be seen in Figure 1.

The model is parameterized by the vector valued parameters {✓d}Dd=1, and {�k}K

k=1, the

parameters {Z

di }d=1,...,D,i=1,...,Nd

., and the scalar positive parameters ↵ and �. The model

is formally written as:

✓d ⇠ DirK(↵

~

1)

�k ⇠ DirM (�

~

1)

z

di ⇠ Mult(✓d)

w

di ⇠ Mult(�zd

i)

1

✓d ⇠ DirK (↵~1)

�k ⇠ DirM(�~1)

z

di ⇠ Discrete(✓d)

w

di ⇠ Discrete(�zdi

)

Wood (University of Oxford) Unsupervised Machine Learning January, 2014 15 / 19

Page 10: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

What are some challenges?

Page 11: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Challenges• Unbounded recursion

• Equality and continuous variables

Page 12: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Unbounded Recursion(defn geometric "generates geometrically distributed values in {0,1,2,...}" ([p] (geometric p 0)) ([p n] (if (sample (flip p)) n (geometric p (+ n 1)))))

0

1

1-pp

1-pp

2

1-pp

Page 13: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Defining Distributions(defm pick-a-stick [stick v l k] “picks a stick given a stick generator given a value v ~ uniform-continuous(0,1) should be called with l = 0.0, k=1” (let [u (+ l (stick k))] (if (> u v) k (pick-a-stick stick v u (+ k 1)))))

(stick 1) (stick 2) (stick 3) (stick 4) (…) (stick 6)

v (sample (uniform-continuous 0 1))

Page 14: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Semantics and Termination(defn p [] (if (sample (flip 0.5)) 1 (if (sample (flip 0.5)) (p) (infinite-loop))))

(def infinite-loop #(loop [] (recur)))

1

1

1

0.50.5

0.50.5

0.50.5

0.50.5

0.50.5

p(x = 1) =1X

n=1

1

2

2n�1

=2

3?

Page 15: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Equality and Continuous Variables

Why are your probabilistic programming systems anti-

equality?

Page 16: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

(defquery bayes-net [] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler (cond (= is-cloudy true ) (sample (flip 0.1)) (= is-cloudy false) (sample (flip 0.5))) wet-grass (cond (and (= sprinkler true) (= is-raining true))

(sample (flip 0.99)) (and (= sprinkler false) (= is-raining false)) (sample (flip 0.0)) (or (= sprinkler true) (= is-raining true))

(sample (flip 0.9)))] (observe (= wet-grass true))

(predict :s (hash-map :is-cloudy is-cloudy :is-raining is-raining :sprinkler sprinkler))))

Equality

Page 17: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

(defquery bayes-net [] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler (cond (= is-cloudy true ) (sample (flip 0.1)) (= is-cloudy false) (sample (flip 0.5))) wet-grass (cond (and (= sprinkler true) (= is-raining true))

(sample (flip 0.99)) (and (= sprinkler false) (= is-raining false)) (sample (flip 0.0)) (or (= sprinkler true) (= is-raining true))

(sample (flip 0.9)))] (observe (dirac wet-grass) true)

(predict :s (hash-map :is-cloudy is-cloudy :is-raining is-raining :sprinkler sprinkler))))

p(x|y = o) / �(y � o)p(x,y)

= p(x,y = o)

Dirac Observe

Page 18: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

(defquery bayes-net [] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler (cond (= is-cloudy true ) (sample (flip 0.1)) (= is-cloudy false) (sample (flip 0.5))) wet-grass (cond (and (= sprinkler true) (= is-raining true))

(sample (flip 0.99)) (and (= sprinkler false) (= is-raining false)) (sample (flip 0.0)) (or (= sprinkler true) (= is-raining true))

(sample (flip 0.9)))] (observe (normal 0.0 tolerance) (d wet-grass true))

(predict :s (hash-map :is-cloudy is-cloudy :is-raining is-raining :sprinkler sprinkler))))

ABC Observe

p(x|y = o) / p(d(y,o))p(x,y)

Page 19: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

(defquery bayes-net [] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler (cond (= is-cloudy true ) (sample (flip 0.1)) (= is-cloudy false) (sample (flip 0.5))) wet-grass (cond (and (= sprinkler true) (= is-raining true))

(flip 0.99) (and (= sprinkler false) (= is-raining false)) (flip 0.0) (or (= sprinkler true) (= is-raining true))

(flip 0.9))] (observe wet-grass true)

(predict :s (hash-map :is-cloudy is-cloudy :is-raining is-raining :sprinkler sprinkler))))

Noisy Observe

p(x|y = o) / p(o|x)p(x)

Page 20: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Continuous Variables(defquery unknown-mean [] (let [sigma (sqrt 2) mu (marsaglia-normal 1 5)] (observe (normal mu sigma) 9) (observe (normal mu sigma) 8) (predict :mu mu)))

Page 21: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Measure Theoretic Challenges

(defquery which-nationality [gpa] (let [nationality (sample (categorical [["USA" 0.25] ["India" 0.75]])) simulated_gpa (if (= nationality "USA") (american-gpa) (indian-gpa))] (observe (dirac simulated_gpa) gpa) (predict :nationality nationality)))

p(nationality = “USA"| gpa = 4.0) = ?

The “Indian GPA problem”

Page 22: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

American GPA Distribution [0,4](defn american-gpa [] (if (sample (flip 0.95)) (* 4 (sample (beta 8 2))) (if (sample (flip 0.85)) 4.0 0.0)))

Page 23: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Indian GPA Distribution [0,10](defn indian-gpa [] (if (sample (flip 0.99)) (* 10 (sample (beta 5 5))) (if (sample (flip 0.1)) 0.0 10.0)))

Page 24: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Mixed GPA Distribution(defn student-gpa [] (if (sample (flip 0.25)) (american-gpa) (indian-gpa)))

Page 25: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Measure Theoretic Challenges

(defquery which-nationality [gpa tolerance] (let [nationality (sample (categorical [["USA" 0.25] ["India" 0.75]])) simulated_gpa (if (= nationality "USA") (american-gpa) (indian-gpa))] (observe (normal simulated_gpa tolerance) gpa) (predict :nationality nationality)))

p(nationality = “USA"| gpa = 4.0) = ?

The “Indian GPA problem” by Russell

Page 26: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Where are we now?

Page 27: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Discrete RV’s Only

2000

1990

2010

SystemsPL

HANSAI

IBAL

Figaro

ML STATS

WinBUGS

BUGS

JAGS

STANLibBi

Venture Anglican

Church

Probabilistic-C

infer.NET

webChurch

Blog

Factorie

AI

Prism

Prolog

KMP

Bounded Recursion

Problog

Simula

Page 28: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Ways forward...

Page 29: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Trace Probability• observe data points

• internal random choices

• simulate from

by running the program forward

• weight execution traces byy1 y2

x1 x2

x11 x12 x13 x21 x22

{ {etc

p(y1:N ,x1:N ) =NY

n=1

g(yn|x1:n)f(xn|x1:n�1)

y1 y2

x1 x2 x3

y3

f(xn|x1:n�1)

g(yn|x1:n)

xn

yn

Page 30: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

n = 1 n = 2Iteratively,

- simulate - weight - resample

SMC

Observe

Parti

cle

Page 31: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Intuitively

- run- wait - fork

SMC for Probabilistic ProgrammingTh

read

s

observe delimiter

continuations

Page 32: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

SMC Inner Loop

n n n

n n n

n n n

• Sequential Monte Carlo is now a building block for other inference techniques

• Particle MCMC - PIMH : “particle

independent Metropolis-Hastings”

- iCSMC : “iterated conditional SMC”

-­‐    

s=1

s=2

s=3

[Andrieu, Doucet, Holenstein 2010]

[W., van de Meent, Mansinghka 2014]

Page 33: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

SMC slowed down for clarity

SMC Parallelism Bottleneck

Page 34: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Asynchronously

- simulate - weight - branch

n = 1 n = 2

Particle Cascade

Paige, W., Doucet, Teh; NIPS 2014

Page 35: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Particle Cascade

Page 36: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Particle Cascade

Page 37: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Particle Cascade

Page 38: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Particle Cascade

Page 39: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Particle Cascade

Page 40: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Particle Cascade

Page 41: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Particle Cascade

Page 42: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

The particle cascade provides an unbiased estimator of the marginal likelihood, whose variance decreases proportionally to the number of initial particles K0:

Theorem: For any K0 ≥ 1 and n ≥ 0, .

Theorem: For any n ≥ 0, there exists a constant an such that

p(y0:n) :=1

K0

KnX

k=1

W kn

V[p(y0:n)] <anK0

E[p(y0:n)] = p(y0:n)

Theoretical Properties

Page 43: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Conclusion

Page 44: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Bubble Up

Inference

Probabilistic Programming Language

Models

Applications

Probabilistic Programming System

Page 45: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Thank You• Questions?

• Funding: DARPA, Amazon, Microsoft

Page 46: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Opportunities• Parallelism

“Asynchronous Anytime Sequential Monte Carlo” [Paige, W., Doucet, Teh NIPS 2014]

• Backwards passing “Particle Gibbs with Ancestor Sampling for Probabilistic Programs” [van de Meent, Yang, Mansinghka, W. AISTATS 2015]

• Search “Maximum a Posteriori Estimation by Search in Probabilistic Models” [Tolpin, W., SOCS, 2015]

• Adaptation “Output-Sensitive Adaptive Metropolis-Hastings for Probabilistic Programs” [Tolpin, van de Meent, Paige, W ; in submission]

• Novel proposals “Adaptive PMCMC” [Paige, W.; in submission]

Page 47: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Probabilistic-C z0 ⇠ Discrete([1/K, . . . , 1/K]) zn|zn�1 ⇠ Discrete(Tzn�1) yn|zn ⇠ Normal(µzn ,�

2)

Paige & W.; ICML 2014

Page 48: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

How can you participate?

Page 49: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Ways to Participate• Contribute applications

• https://bitbucket.org/fwood/anglican-examples

• Contribute inference algorithms

• https://bitbucket.org/dtolpin/embang

Page 50: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

An Analogy

Automatic Differentiation Supervised Learning

Probabilistic Programming Unsupervised Learning

Page 51: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

General Purpose Inference(defquery sat-solver [N formula] "explores an N-dimensional universe for worlds that satisfy the formula” (let [state (repeatedly N (fn [] (sample (flip 0.5))))] (observe (dirac (formula state)) true) (predict :state state)))

(defdist dirac "Dirac distribution" [x] [] (sample [this] x) (observe [this value] (if (= x value) 0.0 NegInf)))

(defm satisfiable-3cnf-formula [state] (let [v (fn [i] (nth state i))] (and (or (v 0) (not (v 1)) (not (v 2))) (or (not (v 0)) (v 1) (v 2)) (or (not (v 0)) (not (v 1)) (not (v 2))))))

Page 52: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

General Purpose Inference(defquery md5-inverse [L md5str] "conditional distribution of strings that map to the same MD5 hashed string" (let [mesg (sample (string-generative-model L))] (observe (dirac md5str) (md5 mesg)) (predict :message mesg))))

Page 53: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

NN

AI

RLPM

Vision

Page 54: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Particle Cascade

Page 55: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Not Sum-Product: Bayesian HMMTk ⇠ Dirichlet(↵k)Suppose the transition matrix is unknown:

Paige & W.; ICML 2014

Page 56: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

2000

1990

2010

Range of EffectivenessPL

HANSAI

IBAL

Figaro

ML STATS

WinBUGS

BUGS

JAGS

STANLibBi

Venture Anglican

Church

Probabilistic-C

infer.NET

webChurch

Blog

Factorie

AI

Prism

Prolog

KMP

Page 57: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Continuous Variables(defm marsaglia-normal [mean var] (let [d (uniform-continuous -1.0 1.0) x (sample d) y (sample d) s (+ (* x x) (* y y))] (if (< s 1) (+ mean (* (sqrt var)

(* x (sqrt (* -2 (/ ( log s) s)))))) (marsaglia-normal mean var))))

Page 58: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Scalability: Particle Count

• Comparison across particle-based inference approaches: raw speed of drawing samples

Page 59: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Unbounded Recursion

Expressivity Efficiency

Page 60: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Credits

• Code highlighting: http://hilite.me

Page 61: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Forward Inference (SMC)

Observe

Parti

cle

/ C

ontin

uatio

n

Page 62: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Bayesian Nonparametrics(defm pick-a-stick [stick v l k] ; picks a stick given a stick generator ; given a value v ~ uniform-continuous(0,1) ; should be called with l = 0.0, k=1 (let [u (+ l (stick k))] (if (> u v) k (pick-a-stick stick v u (+ k 1)))))

(defm remaining [b k] (if (<= k 0) 1 (* (- 1 (b k)) (remaining b (- k 1)))))

(defm polya [stick] ; given a stick generating function ; polya returns a function that samples ; stick indexes from the stick lengths (let [uc01 (uniform-continuous 0 1)] (fn [] (let [v (sample uc01)] (pick-a-stick stick v 0.0 1)))))

(defm dirichlet-process-breaking-rule [alpha k] (sample (beta 1.0 alpha)))

(defm stick [breaking-rule] ; given a breaking-rule function which ; returns a value between 1 and 0 given a ; stick index k returns a function that ; returns the stick length for index k (let [b (mem breaking-rule)] (fn [k] (if (< 0 k) (* (b k) (remaining b (- k 1))) 0))))

(stick 1) (stick 2) (stick 3) (stick 4) (…) (stick 6)

v (sample(uniform-continuous 0 1))

Page 63: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:

Syntax & Implementation Considerations• Embedded vs. Standalone

• Imperative vs. functional

• Lisp vs. Python vs. C vs.