View
3
Download
0
Category
Preview:
Citation preview
An Introduction to Bayesian Optimisation and(Potential) Applications in Materials Science
Kirthevasan Kandasamy
Machine Learning Dept, CMU
Electrochemical Energy Symposium
Pittsburgh, PA, November 2017
Designing Electrolytes in Batteries
1/19
Black-box Optimisation in Computational Astrophysics
Cosmological Simulator
Observation
E.g:Hubble ConstantBaryonic Density
Likelihood Score
Likelihood computation
1/19
Black-box Optimisation
Expensive Blackbox Function
Other Examples:- Pre-clinical Drug Discovery- Optimal policy in Autonomous Driving- Synthetic gene design
1/19
Black-box Optimisation
f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.
Let x? = argmaxx f (x).
x
f(x)
2/19
Black-box Optimisation
f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.
Let x? = argmaxx f (x).
x
f(x)
2/19
Black-box Optimisation
f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.Let x? = argmaxx f (x).
x
f(x)
x∗
f(x∗)
2/19
Outline
I Part I: Bayesian Optimisation
I Bayesian Models for f
I Two algorithms: upper confidence bounds & Thompsonsampling
I Part II: Some Modern Challenges
I Multi-fidelity Optimisation
I Parallelisation
3/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Functions with no observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Functions with no observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Prior GP
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Posterior GP given observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Posterior GP given observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x)
1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .
5/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x)
1) Construct posterior GP.
2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .
5/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x) ϕt = µt−1 + β1/2t σt−1
1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .
5/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x) ϕt = µt−1 + β1/2t σt−1
xt
1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x).
4) Evaluate f at xt .
5/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x) ϕt = µt−1 + β1/2t σt−1
xt
1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .5/19
GP-UCB (Srinivas et al. 2010)
x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 1x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 2x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 3x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 4x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 5x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 6x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 7x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 11x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 25x
f(x)
6/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
1) Construct posterior GP. 2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .
7/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
1) Construct posterior GP.
2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .
7/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
1) Construct posterior GP. 2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .
7/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
xt
1) Construct posterior GP. 2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x).
4) Evaluate f at xt .
7/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
xt
1) Construct posterior GP. 2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .
7/19
More on Bayesian Optimisation
Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .
Other criteria for selecting xt :
I Expected improvement (Jones et al. 1998)
I Probability of improvement (Kushner et al. 1964)
I Predictive entropy search (Hernandez-Lobato et al. 2014)
I Information directed sampling (Russo & Van Roy 2014)
Other Bayesian models for f :
I Neural networks (Snoek et al. 2015)
I Random Forests (Hutter 2009)
8/19
More on Bayesian Optimisation
Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .
Other criteria for selecting xt :
I Expected improvement (Jones et al. 1998)
I Probability of improvement (Kushner et al. 1964)
I Predictive entropy search (Hernandez-Lobato et al. 2014)
I Information directed sampling (Russo & Van Roy 2014)
Other Bayesian models for f :
I Neural networks (Snoek et al. 2015)
I Random Forests (Hutter 2009)
8/19
More on Bayesian Optimisation
Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .
Other criteria for selecting xt :
I Expected improvement (Jones et al. 1998)
I Probability of improvement (Kushner et al. 1964)
I Predictive entropy search (Hernandez-Lobato et al. 2014)
I Information directed sampling (Russo & Van Roy 2014)
Other Bayesian models for f :
I Neural networks (Snoek et al. 2015)
I Random Forests (Hutter 2009)
8/19
Some Modern Challenges/Opportunities
1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b,
Kandasamy et al. ICML 2017)
2. Parallelisation (Kandasamy et al. Arxiv 2017)
9/19
1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)
Desired function f is very expensive, but . . .we have access to cheap approximations.
x⋆f
f1, f2, f3 ≈ f whichare cheaper to evaluate.
E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation
10/19
1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)
Desired function f is very expensive, but . . .we have access to cheap approximations.
x⋆
f1
f2
f3
f
f1, f2, f3 ≈ f whichare cheaper to evaluate.
E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation
10/19
1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)
Desired function f is very expensive, but . . .we have access to cheap approximations.
x⋆
f1
f2
f3
f
f1, f2, f3 ≈ f whichare cheaper to evaluate.
E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation
10/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b)
Multi-fidelity Gaussian Process Upper Confidence Bound
With 2 fidelities (1 Approximation),
x⋆xt
t = 14
f (1)
f (2)
Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).
Can be extended to multiple approximations and continuousapproximations.
11/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b)
Multi-fidelity Gaussian Process Upper Confidence Bound
With 2 fidelities (1 Approximation),
x⋆xt
t = 14
f (1)
f (2)
Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).
Can be extended to multiple approximations and continuousapproximations.
11/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b)
Multi-fidelity Gaussian Process Upper Confidence Bound
With 2 fidelities (1 Approximation),
x⋆xt
t = 14
f (1)
f (2)
Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).
Can be extended to multiple approximations and continuousapproximations.
11/19
Experiment: Cosmological Maximum Likelihood Inference
I Type Ia Supernovae Data
I Maximum likelihood inference for 3 cosmological parameters:
I Hubble Constant H0
I Dark Energy Fraction ΩΛ
I Dark Matter Fraction ΩM
I Likelihood: Robertson Walker metric (Robertson 1936)
Requires numerical integration for each point in the dataset.
12/19
Experiment: Cosmological Maximum Likelihood Inference
3 cosmological parameters. (d = 3)Fidelities: integration on grids of size (102, 104, 106). (M = 3)
500 1000 1500 2000 2500 3000 3500-10
-5
0
5
10
13/19
Experiment: Hartmann-3D
2 Approximations (3 fidelities).We want to optimise the m = 3rd fidelity, which is the mostexpensive. m = 1st fidelity is cheapest.
0 0.5 1 1.5 2 2.5 3 3.50
5
10
15
20
25
30
35
40
Num.of
Queries
Query frequencies for Hartmann-3D
f (3)(x)
m=1m=2m=3
14/19
2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.
Sequential evaluations with one worker
Parallel evaluations with M workers (Asynchronous)
Parallel evaluations with M workers (Synchronous)
15/19
2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.
Sequential evaluations with one worker
Parallel evaluations with M workers (Asynchronous)
Parallel evaluations with M workers (Synchronous)
15/19
2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.
Sequential evaluations with one worker
Parallel evaluations with M workers (Asynchronous)
Parallel evaluations with M workers (Synchronous)
15/19
2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.
Sequential evaluations with one worker
Parallel evaluations with M workers (Asynchronous)
Parallel evaluations with M workers (Synchronous)
15/19
Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)
Asynchronous: asyTS
At any given time,1. (x ′, y ′)← Wait for
a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.
4. Re-deploy worker atargmax g .
Synchronous: synTS
At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for
all workers to finish.2. Compute posterior GP.3. Draw M samples
gm ∼ GP, ∀m.4. Re-deploy worker m at
argmax gm, ∀m.
16/19
Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)
Asynchronous: asyTS
At any given time,1. (x ′, y ′)← Wait for
a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.
4. Re-deploy worker atargmax g .
Synchronous: synTS
At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for
all workers to finish.2. Compute posterior GP.3. Draw M samples
gm ∼ GP, ∀m.4. Re-deploy worker m at
argmax gm, ∀m.
16/19
Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution
0 10 20 30 40
10 -2
10 -1
17/19
Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution
0 10 20 30 40
10 -2
10 -1
17/19
Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution
synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS
0 10 20 30 40
10 -2
10 -1
17/19
Experiment: Hartmann-18D M = 25Evaluation time sampled from an exponential distribution
synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS
0 5 10 15 20 25 30
2.5
3
3.5
4
4.5
55.5
66.5
18/19
Summary
I Black-box Optimisation methods are used in several scientificand engineering applications.
I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .
I Some modern challenges
I Multi-fidelity optimisation
I Parallel evaluations
I and several more . . .
Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa
19/19
Summary
I Black-box Optimisation methods are used in several scientificand engineering applications.
I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .
I Some modern challenges
I Multi-fidelity optimisation
I Parallel evaluations
I and several more . . .
Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa
19/19
Recommended