15 mdp and bernoulli

Markov Decision Processes and Bernoulli ProcessesOPER 540: Stochastic Modeling and Analysis

Sarah G. Nurre

Department of Operational Sciences/ENSAir Force Institute of Technology

February 25, 20141

1Last Updated: February 24, 2014S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 1 / 32

1 Markov Decision Processes

2 Counting Process

3 Bernoulli Processes

S.G. Nurre (AFIT) Class 15: MDPs and Bernoulli 2 / 32

Markov Decision Processes


2 Counting Process




Definition 1

A Markov Decision Process (MDP) is described by

1 A state space S = {1, 2, . . . ,N}.

2 A set of K decisions set (for each state i , there is a finite set ofdecisions D(i)).

3 Transition probabilities, Pij(k) = Probability of moving to state jfrom state i when making decision k ∈ D(i).

4 Expected rewards or costs, Cik = expected reward/cost when in statei making decision k ∈ D(i).



Example 2

Hillier and Lieberman: A manufacturer has one key machine at the coreof one of it’s production processes. Because of heavy use, the machinedeteriorates rapidly in both quality and output. Therefore, at the end ofeach week, a thorough inspection is done that results in classifying thecondition of the machine in one of four possible states:

1 Good as new

2 Operable-minor deterioration

3 Operable-major deterioration

4 Inoperable-output of unacceptable quality



Example 2 Continued

The following matrix shows the relative frequency (probability) ofeach possible transition from the state in one month to the state inthe following month.

P =

∥∥∥∥∥∥∥∥0 7

8116

116

0 34

18

18

0 0 12

12

0 0 0 1

∥∥∥∥∥∥∥∥Let {Xt , t = 0, 1, 2, . . .} represent the Markov chain representing themachine state for each month.

State 4 is an absorbing state.

Repair is not feasible for this machine, replacement is the only option.



Example 2 Continued

The replacement process takes 1 week to complete.

The cost of the lost production is $2, 000 and the cost of replacingthe machine is $4, 000, resulting in a total cost of $6, 000.

Even before the machine reaches state 3, costs may be incurred fromthe production of defective items. The expected costs per week fromthis source are as follows:

State Expected Cost

1 02 1,0003 3,000

Replacing the machine when it enters state 3 is an example of onepolicy.

The decisions associated with each state for this policy are{NR,NR,NR,R}.



Example 2 Continued

With this policy the P matrix changes to the following

P =

∥∥∥∥∥∥∥∥0 7

8116

116

0 34

18

18

0 0 12

12

1 0 0 0

∥∥∥∥∥∥∥∥To evaluate this maintenance policy, we should consider both theimmediate costs incurred over the coming week and the subsequentcosts that result from having the system evolve in this way.

We are interested in calculating the expected average cost per unittime.



Example 2 Continued

How would we calculate this?

0π1 + 1000π2 + 3000π3 + 6000π4

I have calculated that

I π1 = 213

I π2 = 713

I π3 = 213

I π4 = 213

Which results in a total cost of $1, 923.08.



Example 2 Continued

How would we calculate this?

0π1 + 1000π2 + 3000π3 + 6000π4

I have calculated that

I π1 = 213

I π2 = 713

I π3 = 213

I π4 = 213

Which results in a total cost of $1, 923.08.



Example 2 Continued

Are we done, is this the best policy?

There are other policies that should be considered and compared tothis one.

For example, we could overhaul the system, which is a type of repair,which would return it to state 2.

The following are the sets of decisions applicable to each state

Decision Action Relevant States

1 Do nothing 1, 2, 32 Overhaul 33 Replace 2, 3, 4



Example 2 Continued

Are we done, is this the best policy?

There are other policies that should be considered and compared tothis one.

For example, we could overhaul the system, which is a type of repair,which would return it to state 2.

The following are the sets of decisions applicable to each state

Decision Action Relevant States

1 Do nothing 1, 2, 32 Overhaul 33 Replace 2, 3, 4



Example 2 Continued

The associated costs for each state and decision is given as follows:

State Decision Cost

1 1 02 1 10002 3 60003 1 30003 2 40003 3 60004 3 6000



Example 2 Continued

Based on these decisions, we can create different policies and theirassociated P matrices.

Policies

A. Replace in state 4B. Replace in state 4, overhaul in state 3C. Replace in states 3 and 4D. Replace in states 2, 3, 4

Create the P matrix for each of these policies.



Example 2 Continued

Of these policies, we would pick the one that costs the least(maximizes the negative reward).

One should note, these are stationary policies, meaning wheneverthe system is in state i , the rule for making the decision is the sameregardless of the value of the current time t.

Further, these are deterministic policies, meaning whenever thesystem is in state i , the rule for making the decision definitely choosesone particular decision.

There are randomized policies, where a probability distribution isused for the decision to be made.



Linear Programming for MDPs (woo!)

By defining variables and parameters appropriately, we can use linearprogramming to determine the optimal policy!

For each policy, define a (number of states by number of decisions)matrix where

Dik =

{1 if decision k is to made in state i

0 otherwise(1)

Create such a matrix for policy B.



Linear Programming for MDPs

Define decision variable yik to be the steady-state unconditionalprobability that the system is in state i and decision k is made.

For example, in this problem we need to define

I y11, y21, y31 for decision 1 which applies to states 1, 2, 3 (do nothing)I y32 for decision 2 which applies to state 3 (overhaul)I y23, y33, y43 for decision 3 which applies to states 2, 3, 4 (replace)

What is the relationship between yik and πi

I πi =K∑

k=1

yik , where K is the total number of decisions

I For example, π2 = y21 + y23




Define decision variable yik to be the steady-state unconditionalprobability that the system is in state i and decision k is made.

For example, in this problem we need to define

I y11, y21, y31 for decision 1 which applies to states 1, 2, 3 (do nothing)I y32 for decision 2 which applies to state 3 (overhaul)I y23, y33, y43 for decision 3 which applies to states 2, 3, 4 (replace)

What is the relationship between yik and πi

I πi =K∑

k=1

yik , where K is the total number of decisions

I For example, π2 = y21 + y23




We know∞∑i=1

πi = 1. How do we translate this to yik?

I∞∑i=1

K∑k=1

yik = 1

I y11 + y21 + y31 + y32 + y23 + y33 + y43 = 1

We add this as a constraint to our model.

We also know πj =∞∑i=1

πiPij . How does this translate to yik?

IK∑

k=1

yjk =∞∑i=1

K∑k=1

yikPij(k)

I Write out one such equation for our example.




We know∞∑i=1


I∞∑i=1

K∑k=1

yik = 1

I y11 + y21 + y31 + y32 + y23 + y33 + y43 = 1




IK∑

k=1

yjk =∞∑i=1

K∑k=1

yikPij(k)





We know∞∑i=1


I∞∑i=1

K∑k=1

yik = 1

I y11 + y21 + y31 + y32 + y23 + y33 + y43 = 1




IK∑

k=1

yjk =∞∑i=1

K∑k=1

yikPij(k)





We also add this constraint to the model - now have two constraints.

Last constraint we need is restricting each yik ≥ 0.

Now, we need an objective function, with an MDP what are we tryingto accomplish?

Minimize Cost or Maximize Reward

Let Cik represent the expected cost of being in state i with decisionk .

Our objective function is∞∑i=1

K∑k=1

Cikyik




We also add this constraint to the model - now have two constraints.

Last constraint we need is restricting each yik ≥ 0.

Now, we need an objective function, with an MDP what are we tryingto accomplish?

Minimize Cost or Maximize Reward

Let Cik represent the expected cost of being in state i with decisionk .

Our objective function is∞∑i=1

K∑k=1

Cikyik



Linear Programming for MDPs - Example 2

min 1000y21 + 6000y23 + 3000y31 + 4000y32 + 6000y33 + 6000y43

s.t. y11 + y21 + y31 + y32 + y23 + y33 + y43 = 1

y11 − (y23 + y33 + y43) = 0

y21 + y23 −(

7

8y11 +

3

4y21 + y32

)= 0

y31 + y32 + y33 −(

1

16y11 +

1

8y21 +

1

2y31

)= 0

y43 −(

1

16y11 +

1

8y21 +

1

2y31

)= 0

y11, y21, y31, y32, y23, y33, y43 ≥ 0




This is a small model, so we can solve it using Excel Solver.

The optimal solution is:

I y11 = 221

I (y21, y23) =(57 , 0)

I (y31, y32, y33) =(0, 2

21 , 0)

I y33 = 221

From this solution, we can see that for each state, we get onedecision with a positive steady state value.

It is true, that yik > 0 for only one k for each state i , resulting in onedecision.




This is a small model, so we can solve it using Excel Solver.

The optimal solution is:

I y11 = 221

I (y21, y23) =(57 , 0)

I (y31, y32, y33) =(0, 2

21 , 0)

I y33 = 221

From this solution, we can see that for each state, we get onedecision with a positive steady state value.

It is true, that yik > 0 for only one k for each state i , resulting in onedecision.




In general, the following is the form for the LP formulation of anMDP:

min(cost) / max(reward)

∞∑i=0

K∑k=1

Cikyik

s.t.∞∑i=0

K∑k=1

yik = 1

K∑k=1

yjk −∞∑i=0

K∑k=1

yikPij(k) = 0 ∀j

yik ≥ 0, ∀i , k


Counting Process


2 Counting Process



Counting Process

Definition 3

A stochastic process {N(t), t ≥ 0} is said to be a counting process ifN(t) denotes the number of occurrences of some event of interest thathave occurred in the interval [0, t].

Note that “time zero” and the “measure of time” (minutes, days,etc.) needs to be specified. Note also that time may be measured indiscrete steps (e.g. items, people, etc.).


Counting Process

Example 4

The following are examples of counting processes.

1 {N(t), t ≥ 0} where N(t) = number of earthquakes in California

2 {N(t), t ≥ 0} where N(t) = number of births at a hospital

3 {N(t), t ≥ 0} where N(t) = number of hamburgers sold at arestaurant

4 {N(t), t ≥ 0} where N(t) = number of machine failures in a facility

5 {N(t), t ≥ 0} where N(t) = number of lost sales due to stock-outs inan inventory system

6 {N(t), t ≥ 0} where N(t) = number of defectives found in the first titems inspected

7 {N(t), t ≥ 0} where N(t) = number of “yes” votes in the first tpeople polled


Counting Process

Theorem 5

Let {N(t), t ≥ 0} be a counting process, then

N(t) ≥ 0 ∀t ≥ 0

N(t) is integer-valued ∀t ≥ 0

If s < t then N(s) ≤ N(t)

For s < t, N(t)− N(s) is the number of events in (s, t].


Counting Process

Example 6

The following are not examples of counting processes.

{N(t), t ≥ 0} where N(t) = population of Ohio

{N(t), t ≥ 0} where N(t) = number of customers in the restaurant

{N(t), t ≥ 0} where N(t) = number of machines operating in afacility

{N(t), t ≥ 0} where N(t) = number of customers waiting in line


Counting Process

Definition 7

Let {N(t), t ≥ 0} be a counting process. Suppose 0 ≤ t1 < t2 ≤ t3 < t4.Then {N(t), t ≥ 0} is said to have independent increments ifN(t2)− N(t1) and N(t4)− N(t3) are independent random variables.

Definition 8

Let {N(t), t ≥ 0} be a counting process. Suppose s ≥ 0 and x > 0. Then{N(t), t ≥ 0} is said to have stationary increments if the probabilitydistribution on N(s + x)− N(s) is independent of s.


Bernoulli Processes


2 Counting Process



Bernoulli Processes

Definition 9

Consider a random experiment having two possible outcomes that arereferred to as “success” and “failure”. Let p denote the probability ofsuccess. Then the random experiment is referred to as a Bernoulli trialhaving success probability p.

Definition 10

Consider a sequence of independent Bernoulli trials with each trial havingprobability of success p. Let N(t) denote the number of successes in thefirst t trials. Then {N(t), t = 1, 2, . . .} is said to be a Bernoulli process.This fact is denoted by {N(t), t = 1, 2, . . .} ∼ BP(p).


Bernoulli Processes

Example 11

The following are examples of Bernoulli processes.1 Suppose a manufacturing process is such that 2% of products are

defective. Let N(t) denote the number of the first t items producedthat are defective.

I BP(0.02)

2 Suppose 18% of all people are born on a Monday. Let N(t) denotethe number of the first t people to enter a room that were born onMonday.

I BP(0.18)

3 Suppose 16% of all customers at a restaurant ordered iced tea. LetN(t) denote the number of the first t customers that order iced tea.

I BP(0.16)


Bernoulli Processes

Theorem 12

Let {N(t), t = 1, 2, . . .} be a Bernoulli process having probability ofsuccess p, then N(t) ∼ binomial(t, p).


Bernoulli Processes

Definition 13

Consider a sequence of independent Bernoulli trials with each trial havingprobability of success p. Let T (k) denote the number of trials until thekth success where k ∈ {1, 2, . . .}. Then T (k) is a negative binomialrandom variable with parameters k and p. Note that T (1) is ageometric random variable with parameter p.

Note: previously we discussed negative binomial random variables andstated that for negative binomial random variable X thatE [X ] = r(1−p)

p this is true when X is defined as the number of failures

before the r th success, meaning X + r total trials are required.

When X is defined as the number of trials at which the r th successoccurs, E [X ] = r

p .


Bernoulli Processes

Key Takeaways

hi

hi

hi

hi

hi


Education

15 mdp and bernoulli