Bayesian Optimization (BO)

Bayesian Optimization

(BO)Javad Azimi

Fall 2010http://web.engr.oregonstate.edu/~azimi/

Outline

• Formal Definition• Application• Bayesian Optimization Steps– Surrogate Function(Gaussian Process)– Acquisition Function

• PMAX• IEMAX• MPI• MEI• UCB• GP-Hedge

Formal Definition

• Input:

• Goal:

Fuel Cell Application

AnodeCathode

Oxidation products

Fuel (organic matter)

This is how an MFC works

SEM image of bacteria sp. on Ni nanoparticle enhanced carbon fibers.

Nano-structure of anode significantly impact the electricity production.

We should optimize anode nano-structure to maximize power by selecting a set of experiment.4

Big Picture• Since Running experiment is very expensive we use BO.

• Select one experiment to run at a time based on results of previous experiments.Current Experiments Our Current Model Select Single Experiment

Run Experiment 5

BO Main Steps

• Surrogate Function(Response Surface , Model)– Make a posterior over unobserved points based

on the prior.– Its parameter might be based on the prior.

Remember it is a BAYESIAN approach.• Acquisition Criteria(Function)– Which sample should be selected next.

Surrogate Function• Simulates the unknown function distribution based

on the prior.– Deterministic (Classical Linear Regression,…)• There is a deterministic prediction for each point x in

the input space.– Stochastic (Bayesian regression, Gaussian Process,

…)• There is a distribution over the prediction for each

point x in the input space. (i.e Normal distribution)– Example• Deterministic: f(x1)=y1, f(x2)=y2• Stochastic: f(x1)=N(y1,2) f(x2)=N(y2,5)

Gaussian Process(GP)

• A Gaussian process is a collection number of random variables, any finite number of which have a joint Gaussian distribution.– Consistency requirement or marginalization

property.• Marginalization property:

Gaussian Process(GP)• Formal prediction:

• Interesting points:– Squared exponential function corresponds to Bayesian linear

regression with an infinite number of basis function.– Variance is independent from observation– The mean is a linear combination of observation.– If the covariance function specifies the entries of covariance

matrix, marginalization is satisfied!

Gaussian Process(GP)• Gaussian Process is:– An exact interpolating regression method.• Predict the training data perfectly. (not true in classical

regression)– A natural generalization of linear regression.• Nonlinear regression approach!

– A simple example of GP can be obtained from Bayesian regression.• Identical results

– Specifies a distribution over functions.

Gaussian process(2):distribution over functions

95% confidence interval for each point x.

Three sampled functions

Gaussian process(2):GP vs Bayesian regression

• Bayesian regression:– Distribution over weight– The prior is defined over the weights.

• Gaussian Process– Distribution over function– The prior is defined over the function space.

• These are the same but from different view.

Short Summary

• Given any unobserved point z, we can define a normal distribution of its prediction value such that:– Its means is the linear combination of the

observed value.– Its variance is related to its distance from

observed value. (closer to observed data, less variance)

BO Main Steps

• Surrogate Function(Response Surface , Model)– Make a posterior over unobserved points based

on the prior.– Its parameter might be based on the prior.

Remember it is a BAYESIAN approach.• Acquisition Criteria(Function)– Which sample should be selected next.

Bayesian Optimization:(Acquisition criterion)

• Remember: we are looking for:

• Input:– Set of observed data.– A set of points with their corresponding mean and variance.

• Goal: Which point should be selected next to get to the maximizer of the function faster.

• Different Acquisition criterion(Acquisition functions or policies)

Policies

• Maximum Mean (MM).• Maximum Upper Interval (MUI).• Maximum Probability of Improvement (MPI).• Maximum Expected of Improvement (MEI).

Policies:Maximum Mean (MM).

• Returns the point with highest expected value.

• Advantage:– If the model is stable and has been learnt very good,

performs very good.• Disadvantage:– There is a high chance to fall in local minimum(just exploit).

• Can converge to global optimum finally?– No

Policies:Maximum Upper Interval (MUI).

• Returns the point with highest 95% upper interval.

• Advantage:– Combination of mean and variance(exploitation and exploration).

• Disadvantage:– Dominated by variance and mainly explore the input space.

• Can converge to global optimum finally?– Yes.– But needs almost infinite number of samples.

Policies:Maximum Probability of Improvement (MPI)

• Selects the sample with highest probability of improving the current best observation (ymax) by some margins m.

Policies:Maximum Probability of Improvement (MPI)

• Advantage:– Considers mean and variance and ymax in policy(smarter than MUI)

• Disadvantage:– Ad-hoc parameter m – Large value of m?

• Exploration– Small value of m?

• Exploitation

Policies:Maximum Expected of Improvement (MEI)

• Maximum Expected of improvement. • Question: Expectation over which variable?– m

Policies:Upper Confidence Bounds

• Select based on the variance and mean of each point.

– The selection of k left to the user.– Recently, a principle approach to select this

parameter has been proposed.

Summary

• We introduced several approaches, each of which has advantage and disadvantage.– MM– MUI– MPI– MEI– GP-UCB

• Which one should be selected for an unknown model?

GP-Hedge• GP-Hedge(2010) • It select one of the baseline policy based on the theoretical

results of multi-armed bandit problem, although the objective is a bit different!

• They show that they can perform better than (or as well as) the best baseline policy in some framework.

Future Works

• Method selection smarter than GP-Hedge with theoretical analysis.

• Batch Bayesian optimization.• Scheduling Bayesian optimization.

Bayesian Optimization (BO)

Documents

Multiobjective Bayesian Optimization Algorithm for ... · PDF fileMultiobjective Bayesian Optimization Algorithm for Combinatorial ... techniques because the complexity of ... combinatorial

MAX-VALUE ENTROPY SEARCH FOR MULTI-OBJECTIVE … · MULTI-OBJECTIVE BAYESIAN OPTIMIZATION qBayesian optimization (BO) is a framework to optimize expensive black-box functions using

Bayesian Optimization of Combinatorial Structures · Bayesian Optimization of Combinatorial Structures Ricardo Baptista1 Matthias Poloczek2 Abstract The optimization of expensive-to-evaluate

Maximizing acquisition functions for Bayesian optimization

Combining Bayesian Optimization and Lipschitz Optimization

Bayesian Optimization (BO)

Scalable Global Optimization via Local Bayesian …...Scalable Global Optimization via Local Bayesian Optimization David Eriksson Uber AI eriksson@uber.com Michael Pearce University

iBOA: The Incremental Bayesian Optimization Algorithm

ADDITIVE TREE-STRUCTURED COVARIANCE FUNCTION ...Abstract—Bayesian optimization (BO) is a sample-efﬁcient global optimization algorithm for black-box functions which are expensive

Bayesian Optimization with Robust Bayesian Neural Networks · Bayesian Optimization with Robust Bayesian Neural Networks Jost Tobias Springenberg Aaron Klein Stefan Falkner Frank

Scalable Bayesian Optimization using Deep Neural Networks · Motivation Bayesian optimization: • Global optimization of expensive, multi-modal and noisy functions • E.g. the hyperparameters

Robust Bayesian Truth Serum - Harvard · PDF fileRobust Bayesian Truth Serum Presentation by Mark Bun and Bo Waggoner 2012-12-03 Presentation by Mark Bun and Bo Waggoner Robust Bayesian

Diversity-Guided Multi-Objective Bayesian Optimization

Bayesian Optimization of Composite Functionsproceedings.mlr.press/v97/astudillo19a/astudillo19a.pdf · 2020-06-22 · Bayesian Optimization of Composite Functions Raul Astudillo1

Bayesian Optimization with Robust Bayesian Neural …aad.informatik.uni-freiburg.de/papers/16-NIPS-BOHamiANN.pdf · Bayesian Optimization with Robust Bayesian Neural Networks Jost

Private Outsourced Bayesian Optimization · rithm for privacy-preserving Bayesian optimiza-tion (BO) in the outsourced setting with a prov-able performance guarantee. We consider

Gaussian process optimization with failures: classi cation and ...fbachoc/optim_with_failure_hal.pdf · Bayesian optimization (BO) is now well-established as an e cient tool to solve

Bayesian Optimization using Deep Gaussian Processes

Bayesian Optimization for Iterative Learning

Efﬁcient High Dimensional Bayesian Optimization with Additivity …papers.nips.cc/paper/8115-efficient-high-dimensional... · 2019-05-10 · Bayesian Optimization (BO) is a versatile