23
The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, 2016 1 / 21

The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

The Knowledge Gradient for Sequential DecisionMaking with Stochastic Binary Feedbacks

Yingfei Wang, Chu Wang and Warren B. Powell

Princeton University

Yingfei Wang Optimal Learning Methods June 22, 2016 1 / 21

Page 2: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Motivation

Sequential Decision Problems

• M discrete alternatives

• Unknow truth µx

• Each time n, the learner chooses an alternative xn, receives reward W nxn .

•offline objective maxEπ[µxN ]

online objective maxEπ∑N−1

n=0 [µxn ]

Overview

• Numerous Communities• Multi-armed bandits• Ranking and selection• Stochastic search• Control theory• ......

• Various Applications• Recommendations: ads, news• Packet routing• Revenue management• Laboratory experiments guidance:• ......

Yingfei Wang Optimal Learning Methods June 22, 2016 2 / 21

Page 3: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Applications with binary outputs

Applications with binary outputs

• Revenue management: whether or not a customer books a room.

• Health analytics: success (patient does not need to return for moretreatment) or failure (patient does need followup care).

• Production of single or double-walled nanotubes:controllable parameters: catalyst, laser power, Hydrogen, pressure,temperature, Ar/CO2, ethylene etc.

Yingfei Wang Optimal Learning Methods June 22, 2016 3 / 21

Page 4: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Applications with binary outputs

Applications with binary outputs

• Revenue management: whether or not a customer books a room.

• Health analytics: success (patient does not need to return for moretreatment) or failure (patient does need followup care).

• Production of single or double-walled nanotubes:controllable parameters: catalyst, laser power, Hydrogen, pressure,temperature, Ar/CO2, ethylene etc.

Single Wall Double Wall

Yingfei Wang Optimal Learning Methods June 22, 2016 4 / 21

Page 5: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Outline

1 Sequential Decision Problems with Binary Outputs

2 The Knowledge Gradient Policy

3 Experimental Results

Yingfei Wang Optimal Learning Methods June 22, 2016 5 / 21

Page 6: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Sequential Decision Problems with Binary Outputs

Layout

1 Sequential Decision Problems with Binary OutputsModelBayesian linear classification and Laplace approximation

2 The Knowledge Gradient Policy

3 Experimental Results

Yingfei Wang Optimal Learning Methods June 22, 2016 6 / 21

Page 7: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Sequential Decision Problems with Binary Outputs Model

Model

• A finite set of alternatives x ∈ X = {x1, . . . , xM}.• Binary outcome y ∈ {−1,+1} with unknown probability p(y = +1|x).

• Goal: given a limited budget N, choose the measurement policy(x0, . . . , xN−1) and the implementation decision that maximizesp(y = +1|x).

• Generalized linear model for modeling probability

p(y = +1|x ,w) = σ(wTx),

where σ(a) = 11+exp(−a) or σ(a) = Φ(a) =

∫ a

−∞N (s|0, 12)ds.

Yingfei Wang Optimal Learning Methods June 22, 2016 7 / 21

Page 8: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Sequential Decision Problems with Binary Outputs Bayesian linear classification and Laplace approximation

Logistic and probit regression

• Training set D = {(xi , yi )}ni=1

• Likelihood p(D|w) =∏n

i=1 σ(yi ·wTxi ).• w = arg minw

∑ni=1− log(σ(yi ·wTxi )).

Bayesian logistic and probit regression

• p(w |D) = 1Z p(D|w)p(w) ∝ p(w)

∏ni=1 σ(yi ·wTxi ).

• Extend to leverage for sequential model updates:

p(w |D0)x0,y1

−−−→ p(w |D1)x1,y2

−−−→ p(w |D2) · · ·• Exact Bayesian inference for linear classifier is intractable.

• Monte Carlo sampling or analytic approximations to the posterior:Laplace approximation.

Yingfei Wang Optimal Learning Methods June 22, 2016 8 / 21

Page 9: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Sequential Decision Problems with Binary Outputs Bayesian linear classification and Laplace approximation

Laplace approximation

• Ψ(w) = log p(D|w) + log p(w).

• Second-order Taylor expansion to Ψ around its MAP (maximum aposteriori) solution w = arg maxw Ψ(w):

Ψ(w) ≈ Ψ(w)− 1

2(w − w)TH(w − w), H = −∇2Ψ(w)|w=w .

• Laplace approximation to the posterior p(w |D) ≈ N (w |w ,H−1).

Yingfei Wang Optimal Learning Methods June 22, 2016 9 / 21

Page 10: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Sequential Decision Problems with Binary Outputs Bayesian linear classification and Laplace approximation

Online Bayesian linear classification based on Laplace approximation

• Extend to leverage for sequential model updates:Laplace approximated posterior serves as prior for the next available data.

• p(wj) = N(wj |m0

j , (q0j )−1

)• (mn

j , qnj ){xn,yn+1}−−−−−−→ (mn+1

j , qn+1j )

• t :=∂2 log σ(yiwT

i x)∂f 2 |f =wT x

mn+1 = arg maxw−1

2

d∑i=1

qni (wi −mni )2 + log(σ(ywTx))

qn+1j = qnj − tx2

j

Yingfei Wang Optimal Learning Methods June 22, 2016 10 / 21

Page 11: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Sequential Decision Problems with Binary Outputs Bayesian linear classification and Laplace approximation

Online Bayesian linear classification based on Laplace approximation

arg maxw−1

2

d∑i=1

qi (wi −mi )2 + log(σ(ywTx)).

• 1-dimensional bisection method:Set ∂Ψ/∂wi = 0. Define p as p := σ′(ywT x)

σ(ywT x). Then we have wi = mi + yp xi

qi.

p =σ′(p

∑di=1 x

2i /qi + ymT x)

σ(p∑d

i=1 x2i /qi + ymT x)

.

The equation has a unique solution in interval [0, σ′(ymT x)/σ(ymT x)].

Yingfei Wang Optimal Learning Methods June 22, 2016 11 / 21

Page 12: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

The Knowledge Gradient Policy

Layout

1 Sequential Decision Problems with Binary Outputs

2 The Knowledge Gradient PolicyKnowledge Gradient Policy for Lookup Table ModelKnowledge Gradient Policy for Linear Bayesian Classification

3 Experimental Results

Yingfei Wang Optimal Learning Methods June 22, 2016 12 / 21

Page 13: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

The Knowledge Gradient Policy

Model

Characteristics of our problems

• Expensive experiments.

• Small samples.

• Requiring that we learn from our decisions as quickly as possible.

Yingfei Wang Optimal Learning Methods June 22, 2016 13 / 21

Page 14: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

The Knowledge Gradient Policy Knowledge Gradient Policy for Lookup Table Model

Model

Knowledge gradient policy for lookup table model [3]

• M discrete alternatives, unknow truth µx , Wx = µx + ε

• µx |Fn ∼ N (θnx , σnx )

• Knowledge state Sn = (θn,σn), V (s) = maxx θx

νKGx (Sn) = E[V(Sn+1(x)

)− V (Sn)] = E[max

x′θn+1x′ (x)−max

x′θnx′ |Sn].

• The Knowledge Gradient (KG) policy XKG (Sn) = arg maxx νKGx (Sn).

Change in estimate of value of alternative 5

due to measurement.

Change which produces a change in the decision.

Yingfei Wang Optimal Learning Methods June 22, 2016 14 / 21

Page 15: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

The Knowledge Gradient Policy Knowledge Gradient Policy for Linear Bayesian Classification

Model

Knowledge gradient policy for linear Bayesian classification belief model

• yx |w ∼ Bernoulli(σ(wTx))

• wj |Fn ∼ N (mnj , (q

nj )−1)

• Knowledge state Sn = (mn,qn)

• V(s)

= maxx p(yx = +1|x , s)

νKGx (Sn) = E[V(Sn+1(x , y)

)− V (Sn)|Sn]

= E[max

x′p(yx′ = +1|x ′, Sn+1(x , y)

)−max

x′p(yx′ = +1|x ′, Sn)|Sn]

• The Knowledge Gradient (KG) policy XKG (Sn) = argmaxx νKGx (Sn).

• The knowledge gradient policy can work with any choice of link function σ(·) andapproximation procedures by adjusting the transition function Sn+1(x , ·)accordingly.

• Online learning [7]: XOLKG (Sn) = argmaxx p(y = +1|x ,Sn) + (N − n)νKGx (Sn).

Yingfei Wang Optimal Learning Methods June 22, 2016 15 / 21

Page 16: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Experimental Results

Layout

1 Sequential Decision Problems with Binary Outputs

2 The Knowledge Gradient Policy

3 Experimental ResultsBehavior of the KG policyComparison with other Policies

Yingfei Wang Optimal Learning Methods June 22, 2016 16 / 21

Page 17: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Experimental Results Behavior of the KG policy

Sampling behavior of the KG policy

−2 0 2

−3

−2

−1

0

1

2

3

x1

x2

−2 0 2

−3

−2

−1

0

1

2

3

x1

x2

−2 0 2

−3

−2

−1

0

1

2

3

x1

x2

−2 0 2

−3

−2

−1

0

1

2

3

x1

x2

−2 0 2

−3

−2

−1

0

1

2

3

x1

x2

−2 0 2

−3

−2

−1

0

1

2

3

x1

x2

Yingfei Wang Optimal Learning Methods June 22, 2016 17 / 21

Page 18: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Experimental Results Behavior of the KG policy

Absolute class distribution error

−2 0 2

−3

−2

−1

0

1

2

3

x1

x 2

0.01

0.02

0.03

0.04

0.05

0.06

−20

2

−2

0

2

−2

−1

0

1

2

x1

x2

x3

0.05

0.1

0.15

0.2

Figure: Absolute distribution error.

Yingfei Wang Optimal Learning Methods June 22, 2016 18 / 21

Page 19: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Experimental Results Comparison with other Policies

Competing policies

• random sampling (Random)

• a myopic method that selects the most uncertain instance each step(MostUncertain)

• discriminative batch-mode active learning (Disc) [4] with batch size set to 1

• expected improvement (EI) [8] with an initial fit of 5 examples

• Thompson sampling (TS) [2]

• UCB on the latent function wTx (UCB) [6]

Metric

Opportunity Cost (OC)

OC := maxx∈X

p(y = +1|x ,w∗)− p(y = +1|xN+1,w∗).

Yingfei Wang Optimal Learning Methods June 22, 2016 19 / 21

Page 20: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Experimental Results Comparison with other Policies

Comparison with other Policies

#Iterations5 10 15 20 25 30

Opportunity

Cost

0.05

0.1

0.15

0.2

0.25

0.3 EIUCBTSRandomMostUncertainDiscKG

(a) sonar#Iterations

5 10 15 20 25 30

Opportunity

Cost

0.1

0.2

0.3

0.4

0.5

0.6

0.7EIUCBTSRandomMostUncertainDiscKG

(b) glass identification#Iterations

5 10 15 20 25 30

Opportunity

Cost

0

0.1

0.2

0.3

0.4

0.5 EIUCBTSRandomMostUncertainDiscKG

(c) blood transfusion

#Iterations5 10 15 20 25 30

Opportunity

Cost

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26EIUCBTSRandomMostUncertainDiscKG

(d) survival#Iterations

5 10 15 20 25 30

Opportunity

Cost

0.05

0.1

0.15

0.2

0.25

0.3EIUCBTSRandomMostUncertainDiscKG

(e) breast cancer (wpbc)#Iterations

5 10 15 20 25 30

Opportunity

Cost

0.11

0.12

0.13

0.14

0.15

0.16

0.17EIUCBTSRandomMostUncertainDiscKG

(f) planning relax

Figure: Opportunity cost on UCI.

Yingfei Wang Optimal Learning Methods June 22, 2016 20 / 21

Page 21: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Experimental Results Comparison with other Policies

Thank you! Questions?

Yingfei Wang Optimal Learning Methods June 22, 2016 21 / 21

Page 22: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Reference

Xavier Boyen and Daphne Koller.

Tractable inference for complex stochastic processes.

In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence,pages 33–42. Morgan Kaufmann Publishers Inc., 1998.

Olivier Chapelle and Lihong Li.

An empirical evaluation of thompson sampling.

In Advances in neural information processing systems, pages 2249–2257, 2011.

Peter I Frazier, Warren B Powell, and Savas Dayanik.

A knowledge-gradient policy for sequential information collection.

SIAM Journal on Control and Optimization, 47(5):2410–2439, 2008.

Yuhong Guo and Dale Schuurmans.

Discriminative batch mode active learning.

In Advances in neural information processing systems, pages 593–600, 2008.

Steffen L Lauritzen.

Propagation of probabilities, means, and variances in mixed graphical association models.

Journal of the American Statistical Association, 87(420):1098–1108, 1992.

Lihong Li, Wei Chu, John Langford, and Robert E Schapire.

A contextual-bandit approach to personalized news article recommendation.

In Proceedings of the 19th international conference on World wide web, pages 661–670.ACM, 2010.

Yingfei Wang Optimal Learning Methods June 22, 2016 21 / 21

Page 23: The Knowledge Gradient for Sequential Decision Making with ... · The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and

Reference

Ilya O Ryzhov, Warren B Powell, and Peter I Frazier.

The knowledge gradient algorithm for a general class of online learning problems.

Operations Research, 60(1):180–195, 2012.

Matthew Tesch, Jeff Schneider, and Howie Choset.

Expensive function optimization with stochastic binary outcomes.

In Proceedings of The 30th International Conference on Machine Learning, pages1283–1291, 2013.

Yingfei Wang Optimal Learning Methods June 22, 2016 21 / 21