39
Markov Decision Processes and Bernoulli Processes OPER 540: Stochastic Modeling and Analysis Sarah G. Nurre Department of Operational Sciences/ENS Air Force Institute of Technology February 25, 2014 1 1 Last Updated: February 24, 2014 S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 1 / 32

15 mdp and bernoulli

  • Upload
    oper540

  • View
    105

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 15 mdp and bernoulli

Markov Decision Processes and Bernoulli ProcessesOPER 540: Stochastic Modeling and Analysis

Sarah G. Nurre

Department of Operational Sciences/ENSAir Force Institute of Technology

February 25, 20141

1Last Updated: February 24, 2014S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 1 / 32

Page 2: 15 mdp and bernoulli

1 Markov Decision Processes

2 Counting Process

3 Bernoulli Processes

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 2 / 32

Page 3: 15 mdp and bernoulli

Markov Decision Processes

1 Markov Decision Processes

2 Counting Process

3 Bernoulli Processes

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 3 / 32

Page 4: 15 mdp and bernoulli

Markov Decision Processes

Definition 1

A Markov Decision Process (MDP) is described by

1 A state space S = {1, 2, . . . ,N}.

2 A set of K decisions set (for each state i , there is a finite set ofdecisions D(i)).

3 Transition probabilities, Pij(k) = Probability of moving to state jfrom state i when making decision k ∈ D(i).

4 Expected rewards or costs, Cik = expected reward/cost when in statei making decision k ∈ D(i).

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 4 / 32

Page 5: 15 mdp and bernoulli

Markov Decision Processes

Example 2

Hillier and Lieberman: A manufacturer has one key machine at the coreof one of it’s production processes. Because of heavy use, the machinedeteriorates rapidly in both quality and output. Therefore, at the end ofeach week, a thorough inspection is done that results in classifying thecondition of the machine in one of four possible states:

1 Good as new

2 Operable-minor deterioration

3 Operable-major deterioration

4 Inoperable-output of unacceptable quality

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 5 / 32

Page 6: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

The following matrix shows the relative frequency (probability) ofeach possible transition from the state in one month to the state inthe following month.

P =

∥∥∥∥∥∥∥∥0 7

8116

116

0 34

18

18

0 0 12

12

0 0 0 1

∥∥∥∥∥∥∥∥Let {Xt , t = 0, 1, 2, . . .} represent the Markov chain representing themachine state for each month.

State 4 is an absorbing state.

Repair is not feasible for this machine, replacement is the only option.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 6 / 32

Page 7: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

The replacement process takes 1 week to complete.

The cost of the lost production is $2, 000 and the cost of replacingthe machine is $4, 000, resulting in a total cost of $6, 000.

Even before the machine reaches state 3, costs may be incurred fromthe production of defective items. The expected costs per week fromthis source are as follows:

State Expected Cost

1 02 1,0003 3,000

Replacing the machine when it enters state 3 is an example of onepolicy.

The decisions associated with each state for this policy are{NR,NR,NR,R}.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 7 / 32

Page 8: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

With this policy the P matrix changes to the following

P =

∥∥∥∥∥∥∥∥0 7

8116

116

0 34

18

18

0 0 12

12

1 0 0 0

∥∥∥∥∥∥∥∥To evaluate this maintenance policy, we should consider both theimmediate costs incurred over the coming week and the subsequentcosts that result from having the system evolve in this way.

We are interested in calculating the expected average cost per unittime.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 8 / 32

Page 9: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

How would we calculate this?

0π1 + 1000π2 + 3000π3 + 6000π4

I have calculated that

I π1 = 213

I π2 = 713

I π3 = 213

I π4 = 213

Which results in a total cost of $1, 923.08.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 9 / 32

Page 10: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

How would we calculate this?

0π1 + 1000π2 + 3000π3 + 6000π4

I have calculated that

I π1 = 213

I π2 = 713

I π3 = 213

I π4 = 213

Which results in a total cost of $1, 923.08.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 9 / 32

Page 11: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

Are we done, is this the best policy?

There are other policies that should be considered and compared tothis one.

For example, we could overhaul the system, which is a type of repair,which would return it to state 2.

The following are the sets of decisions applicable to each state

Decision Action Relevant States

1 Do nothing 1, 2, 32 Overhaul 33 Replace 2, 3, 4

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 10 / 32

Page 12: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

Are we done, is this the best policy?

There are other policies that should be considered and compared tothis one.

For example, we could overhaul the system, which is a type of repair,which would return it to state 2.

The following are the sets of decisions applicable to each state

Decision Action Relevant States

1 Do nothing 1, 2, 32 Overhaul 33 Replace 2, 3, 4

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 10 / 32

Page 13: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

The associated costs for each state and decision is given as follows:

State Decision Cost

1 1 02 1 10002 3 60003 1 30003 2 40003 3 60004 3 6000

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 11 / 32

Page 14: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

Based on these decisions, we can create different policies and theirassociated P matrices.

Policies

A. Replace in state 4B. Replace in state 4, overhaul in state 3C. Replace in states 3 and 4D. Replace in states 2, 3, 4

Create the P matrix for each of these policies.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 12 / 32

Page 15: 15 mdp and bernoulli

Markov Decision Processes

Example 2 Continued

Of these policies, we would pick the one that costs the least(maximizes the negative reward).

One should note, these are stationary policies, meaning wheneverthe system is in state i , the rule for making the decision is the sameregardless of the value of the current time t.

Further, these are deterministic policies, meaning whenever thesystem is in state i , the rule for making the decision definitely choosesone particular decision.

There are randomized policies, where a probability distribution isused for the decision to be made.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 13 / 32

Page 16: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs (woo!)

By defining variables and parameters appropriately, we can use linearprogramming to determine the optimal policy!

For each policy, define a (number of states by number of decisions)matrix where

Dik =

{1 if decision k is to made in state i

0 otherwise(1)

Create such a matrix for policy B.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 14 / 32

Page 17: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs

Define decision variable yik to be the steady-state unconditionalprobability that the system is in state i and decision k is made.

For example, in this problem we need to define

I y11, y21, y31 for decision 1 which applies to states 1, 2, 3 (do nothing)I y32 for decision 2 which applies to state 3 (overhaul)I y23, y33, y43 for decision 3 which applies to states 2, 3, 4 (replace)

What is the relationship between yik and πi

I πi =K∑

k=1

yik , where K is the total number of decisions

I For example, π2 = y21 + y23

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 15 / 32

Page 18: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs

Define decision variable yik to be the steady-state unconditionalprobability that the system is in state i and decision k is made.

For example, in this problem we need to define

I y11, y21, y31 for decision 1 which applies to states 1, 2, 3 (do nothing)I y32 for decision 2 which applies to state 3 (overhaul)I y23, y33, y43 for decision 3 which applies to states 2, 3, 4 (replace)

What is the relationship between yik and πi

I πi =K∑

k=1

yik , where K is the total number of decisions

I For example, π2 = y21 + y23

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 15 / 32

Page 19: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs

We know∞∑i=1

πi = 1. How do we translate this to yik?

I∞∑i=1

K∑k=1

yik = 1

I y11 + y21 + y31 + y32 + y23 + y33 + y43 = 1

We add this as a constraint to our model.

We also know πj =∞∑i=1

πiPij . How does this translate to yik?

IK∑

k=1

yjk =∞∑i=1

K∑k=1

yikPij(k)

I Write out one such equation for our example.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 16 / 32

Page 20: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs

We know∞∑i=1

πi = 1. How do we translate this to yik?

I∞∑i=1

K∑k=1

yik = 1

I y11 + y21 + y31 + y32 + y23 + y33 + y43 = 1

We add this as a constraint to our model.

We also know πj =∞∑i=1

πiPij . How does this translate to yik?

IK∑

k=1

yjk =∞∑i=1

K∑k=1

yikPij(k)

I Write out one such equation for our example.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 16 / 32

Page 21: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs

We know∞∑i=1

πi = 1. How do we translate this to yik?

I∞∑i=1

K∑k=1

yik = 1

I y11 + y21 + y31 + y32 + y23 + y33 + y43 = 1

We add this as a constraint to our model.

We also know πj =∞∑i=1

πiPij . How does this translate to yik?

IK∑

k=1

yjk =∞∑i=1

K∑k=1

yikPij(k)

I Write out one such equation for our example.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 16 / 32

Page 22: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs

We also add this constraint to the model - now have two constraints.

Last constraint we need is restricting each yik ≥ 0.

Now, we need an objective function, with an MDP what are we tryingto accomplish?

Minimize Cost or Maximize Reward

Let Cik represent the expected cost of being in state i with decisionk .

Our objective function is∞∑i=1

K∑k=1

Cikyik

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 17 / 32

Page 23: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs

We also add this constraint to the model - now have two constraints.

Last constraint we need is restricting each yik ≥ 0.

Now, we need an objective function, with an MDP what are we tryingto accomplish?

Minimize Cost or Maximize Reward

Let Cik represent the expected cost of being in state i with decisionk .

Our objective function is∞∑i=1

K∑k=1

Cikyik

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 17 / 32

Page 24: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs - Example 2

min 1000y21 + 6000y23 + 3000y31 + 4000y32 + 6000y33 + 6000y43

s.t. y11 + y21 + y31 + y32 + y23 + y33 + y43 = 1

y11 − (y23 + y33 + y43) = 0

y21 + y23 −(

7

8y11 +

3

4y21 + y32

)= 0

y31 + y32 + y33 −(

1

16y11 +

1

8y21 +

1

2y31

)= 0

y43 −(

1

16y11 +

1

8y21 +

1

2y31

)= 0

y11, y21, y31, y32, y23, y33, y43 ≥ 0

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 18 / 32

Page 25: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs - Example 2

This is a small model, so we can solve it using Excel Solver.

The optimal solution is:

I y11 = 221

I (y21, y23) =(57 , 0)

I (y31, y32, y33) =(0, 2

21 , 0)

I y33 = 221

From this solution, we can see that for each state, we get onedecision with a positive steady state value.

It is true, that yik > 0 for only one k for each state i , resulting in onedecision.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 19 / 32

Page 26: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs - Example 2

This is a small model, so we can solve it using Excel Solver.

The optimal solution is:

I y11 = 221

I (y21, y23) =(57 , 0)

I (y31, y32, y33) =(0, 2

21 , 0)

I y33 = 221

From this solution, we can see that for each state, we get onedecision with a positive steady state value.

It is true, that yik > 0 for only one k for each state i , resulting in onedecision.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 19 / 32

Page 27: 15 mdp and bernoulli

Markov Decision Processes

Linear Programming for MDPs

In general, the following is the form for the LP formulation of anMDP:

min(cost) / max(reward)

∞∑i=0

K∑k=1

Cikyik

s.t.∞∑i=0

K∑k=1

yik = 1

K∑k=1

yjk −∞∑i=0

K∑k=1

yikPij(k) = 0 ∀j

yik ≥ 0, ∀i , k

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 20 / 32

Page 28: 15 mdp and bernoulli

Counting Process

1 Markov Decision Processes

2 Counting Process

3 Bernoulli Processes

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 21 / 32

Page 29: 15 mdp and bernoulli

Counting Process

Definition 3

A stochastic process {N(t), t ≥ 0} is said to be a counting process ifN(t) denotes the number of occurrences of some event of interest thathave occurred in the interval [0, t].

Note that “time zero” and the “measure of time” (minutes, days,etc.) needs to be specified. Note also that time may be measured indiscrete steps (e.g. items, people, etc.).

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 22 / 32

Page 30: 15 mdp and bernoulli

Counting Process

Example 4

The following are examples of counting processes.

1 {N(t), t ≥ 0} where N(t) = number of earthquakes in California

2 {N(t), t ≥ 0} where N(t) = number of births at a hospital

3 {N(t), t ≥ 0} where N(t) = number of hamburgers sold at arestaurant

4 {N(t), t ≥ 0} where N(t) = number of machine failures in a facility

5 {N(t), t ≥ 0} where N(t) = number of lost sales due to stock-outs inan inventory system

6 {N(t), t ≥ 0} where N(t) = number of defectives found in the first titems inspected

7 {N(t), t ≥ 0} where N(t) = number of “yes” votes in the first tpeople polled

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 23 / 32

Page 31: 15 mdp and bernoulli

Counting Process

Theorem 5

Let {N(t), t ≥ 0} be a counting process, then

N(t) ≥ 0 ∀t ≥ 0

N(t) is integer-valued ∀t ≥ 0

If s < t then N(s) ≤ N(t)

For s < t, N(t)− N(s) is the number of events in (s, t].

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 24 / 32

Page 32: 15 mdp and bernoulli

Counting Process

Example 6

The following are not examples of counting processes.

{N(t), t ≥ 0} where N(t) = population of Ohio

{N(t), t ≥ 0} where N(t) = number of customers in the restaurant

{N(t), t ≥ 0} where N(t) = number of machines operating in afacility

{N(t), t ≥ 0} where N(t) = number of customers waiting in line

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 25 / 32

Page 33: 15 mdp and bernoulli

Counting Process

Definition 7

Let {N(t), t ≥ 0} be a counting process. Suppose 0 ≤ t1 < t2 ≤ t3 < t4.Then {N(t), t ≥ 0} is said to have independent increments ifN(t2)− N(t1) and N(t4)− N(t3) are independent random variables.

Definition 8

Let {N(t), t ≥ 0} be a counting process. Suppose s ≥ 0 and x > 0. Then{N(t), t ≥ 0} is said to have stationary increments if the probabilitydistribution on N(s + x)− N(s) is independent of s.

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 26 / 32

Page 34: 15 mdp and bernoulli

Bernoulli Processes

1 Markov Decision Processes

2 Counting Process

3 Bernoulli Processes

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 27 / 32

Page 35: 15 mdp and bernoulli

Bernoulli Processes

Definition 9

Consider a random experiment having two possible outcomes that arereferred to as “success” and “failure”. Let p denote the probability ofsuccess. Then the random experiment is referred to as a Bernoulli trialhaving success probability p.

Definition 10

Consider a sequence of independent Bernoulli trials with each trial havingprobability of success p. Let N(t) denote the number of successes in thefirst t trials. Then {N(t), t = 1, 2, . . .} is said to be a Bernoulli process.This fact is denoted by {N(t), t = 1, 2, . . .} ∼ BP(p).

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 28 / 32

Page 36: 15 mdp and bernoulli

Bernoulli Processes

Example 11

The following are examples of Bernoulli processes.1 Suppose a manufacturing process is such that 2% of products are

defective. Let N(t) denote the number of the first t items producedthat are defective.

I BP(0.02)

2 Suppose 18% of all people are born on a Monday. Let N(t) denotethe number of the first t people to enter a room that were born onMonday.

I BP(0.18)

3 Suppose 16% of all customers at a restaurant ordered iced tea. LetN(t) denote the number of the first t customers that order iced tea.

I BP(0.16)

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 29 / 32

Page 37: 15 mdp and bernoulli

Bernoulli Processes

Theorem 12

Let {N(t), t = 1, 2, . . .} be a Bernoulli process having probability ofsuccess p, then N(t) ∼ binomial(t, p).

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 30 / 32

Page 38: 15 mdp and bernoulli

Bernoulli Processes

Definition 13

Consider a sequence of independent Bernoulli trials with each trial havingprobability of success p. Let T (k) denote the number of trials until thekth success where k ∈ {1, 2, . . .}. Then T (k) is a negative binomialrandom variable with parameters k and p. Note that T (1) is ageometric random variable with parameter p.

Note: previously we discussed negative binomial random variables andstated that for negative binomial random variable X thatE [X ] = r(1−p)

p this is true when X is defined as the number of failures

before the r th success, meaning X + r total trials are required.

When X is defined as the number of trials at which the r th successoccurs, E [X ] = r

p .

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 31 / 32

Page 39: 15 mdp and bernoulli

Bernoulli Processes

Key Takeaways

hi

hi

hi

hi

hi

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 32 / 32