Graphical representation of MDPs · 2012. 11. 6. · 3 Slide 13 Age dependency of milk yield 4400 4600 4800 5000 5200 5400 5600 5800 6000 1st Parity 2nd Parity 3rd Parity 4th Parity

1

Slide 1

Hierarchical Markov decision processes

Anders Ringgaard Kristensen

Slide 2

Outline

Graphical representation of models

Markov property

Hierarchical models

Multi-level models

Decisions on multiple time scale

Markov chain simulation

Slide 3

Difficulties when modeling

The curse of dimensionality

• (Multi-level) Hierarchical processes

Decisions on multiple time scales

• (Multi-level) Hierarchical processes

State space representation

• Decision graphs – Discussed later

Several herd constraints

• Inherit problem of the method

• Parameter iteration

The Markov property

• Memory variables

• Bayesian

Slide 4

Graphical representation of MDPs

Recall the structure of the simple dairy cow replacement model:

Stage:

• 1 lactation cycle

State:

• i=1: Low milk yield

• i=2: Average milk yield

• i=3: High milk yield

Action:

• d=1: Keep the cow

• d=2: Replace the cow at the end of the stage

The structure may be displayed graphically in two different ways:

• As a model tree

• As a decision graph

We will model 10 stages (finite horizon)

Slide 5

The model displayed as a tree

We have a nested structure:

• The root of the model is the process itself

• The process holds 10 stages (the time horizon)

• Each stage holds 3 states (Low, Average, High)

• Each state holds 2 actions (Keep, Replace)

The parameters:

• Each action has a set of parameters attached:

• A reward

• A probability distribution (to the states at next stage).

Implemented in the MLHMP software system

Slide 6

The MLHMP software – the model tree window

2

Slide 7

Summary of the model tree

The nested structure of an MDP is shown directly

Each value (stage, state and action) is displayed as an icon of

a certain type.

A label is (optionally) attached to each value in order to ease

the (human) interpretation of the values.

Asymmetric models are easily handled (and displayed)

Slide 8

The model displayed as a Decision Graph

A Decision Graph consists of variables and directed edges

connecting them:

• A variable is displayed as a circle

• A directed edge is displayed as an arc

In our small example we have basically 3 variables at each

stage:

• The state (random)

• The action (decision)

• The reward (utility)

Slide 9

The Esthauge LIMID software system: The Net window

For a model like this – the DG is rather boring …

Slide 10

Summary of the Decision Graph representation

The causal structure of the model is shown explicitly as directed

edges (arcs).

Each variable is displayed as a node (“circle”) of a certain

kind:

• Chance node (yellow) – state variable

• Decision nodes (green) – actions

• Utility nodes (purple) – rewards

Each variable has a number of values.

Labels may (optionally) be used in order to ease the (human)

interpretation of the variables.

Asymmetric models are difficult to display (and handle)

Slide 11

The Markov property

Let in be the state at stage n

The Markov property is satisfied if, and only if,

• P(in+1| in, in-1, … , i1) = P (in+1| in)

• In words: The distribution of the state at next stage depends only on the present state – previous states are not relevant.

This property is crucial in Markov decision processes.

Slide 12

The Markov property – what does it mean?

Does it mean that the state at stage n+k is independent of the

state at stage n for n > 1?

Let us use the DAG to test:

Conditional independence!

Understanding the Markov property is crucial for understanding

MDPs

We shall come back to the Markov property several times.

3

Slide 13

Age dependency of milk yield

4400

4600

4800

5000

5200

5400

5600

5800

6000

1st

Parity

2nd

Parity

3rd

Parity

4th

Parity

Kg ECM

Slide 14

An extended model, I

State variables

• Age

• Parity 1

• Parity 2

• Parity 3

• Parity 4

• Relative milk yield

• Low

• Average

• High

Slide 15

An extended model, II

Slide 16

An extended model, III

Slide 17

An extended model, IV

Slide 18

Let us take a look at the model tree

4

Slide 19

Age and genotype dependency

0

1000

2000

3000

4000

5000

6000

7000

Par. 1 Par. 2 Par. 3 Par. 4

Low genetic

merit

Average genetic

merit

High genetic

merit

Slide 20

A further extended model

State variables

• Genetic merit:

• Low,

• Average,

• High

• Age:

• Parity 1,

• Parity 2,

• Parity 3,

• Parity 4

• (Relative) milk yield:

• Low,

• Average,

• High

Slide 21

Rewards and output

Slide 22

Transition probabilities, Keep

Slide 23

Transition probabilities, Replace

Slide 24

We shall again take a look at the graphical display

5

Slide 25

An example: Houben et al. (1994)

State variables:

• Age (monthly intervals, 204 levels)

• Milk yield, present lactation (15 levels)

• Milk yield, previous lactation (15 levels)

• Length of calving interval (8 levels)

• Mastitis, present lactation (4 levels)

• Mastitis, previous lactation (4 levels)

• Clinical mastitis (yes/no)

Total state space 6,821,724 states

Houben, E. P. H., R. B. M. Huirne, A. A. Dijkhuizen & A. R. Kristensen. 1994. Optimal replacement of mastitis cows determined by a hierarchic Markov process. Journal of

Dairy Science 77, 2975-2993.

Slide 26

The curse of dimensionality

If

• state variables are represented at a realistic number of levels

• all relevant state variables are included in the model

then

• the state space grows to prohibitive dimensions

Solution:

• Hierarchical models

Slide 27

Most elements are zero because

• Age is included as a state variable

• Some state variables are constant within animal

• Some state variables are constant over several stages

If state numbers are defined appropriately the non-zero elements are arranged in a certain pattern

This can be utilized for a hierarchical organisation of the state space!

Important observations, transition matrix

Slide 28

Illustration of the hierarchy for the example

Founder

Child

Cow 1 Cow 2 Cow 3

1 2 3 4 1 2 3 4 1 2

Genetic merit

Relative milk yield

Keep/Replace

Dummy (no action)

Optimization technique

• Policy iteration in the founder process (exact)

• Value iteration in the child processes (exact)

The positive properties of both techniques are combined into a very efficient and exact hierarchic technique

Slide 29

The dairy cow replacement model as a hierarchical process

Founder process:

• Stage: Life time of a cow

• State: Genetic merit

• Action: Dummy

Child process:

• Stage: A lactation cycle

• State: Milk yield (relative to genetic merit and lactation)

• Action: Keep/Replace

Benefits:

• The age of the cow is known from the child level stage

• The size of the transition matrices are reduced to 3 x 3 (as compared to 36 x 36 in the original model)

Slide 30

Multi-level processes

The hierarchy may be extended to several levels

Actions may be defined at all levels making simultaneous optimization of decisions with different time horizons possible.

Curse of dimensionality circumvented

Simultaneous optimization of decisions at different levels (time horizon)

6

Slide 31

Example: Dimensionality

State variables in original model (van Arendonk 1985):• Age (months) (1-144)

• Milk yield, previous lactation (1-15)

• Milk yield, present lactation (1-15)

Total number of states: 29,880

Stage length: 1 month

Matrix dimension 29,880 x 29,880

Slide 32

As a 2 or 3-level process

Slide 33

Model tree of a hierarchical MDP

In any action of an MDP, the choice of action influences:

• The immediate reward

• The probability distribution of the state at next stage

In a hierarchical model the action (of a parent process) is

modeled as a separate embedded finite time MDP (a child

process):

• The reward is the expected sum of total rewards of the child

• The transition probability distribution is calculated as the matrix

product of all transition matrices of the child process.

The action of a parent process is an ordinary action of which

the reward and transition probabilities are calculated in a

special way (from the child process).

In the model tree we just ad a child process to the action!

Slide 34

Model tree of a hierarchical MDP

Slide 35

Markov chain simulation

As a supplement to the optimal policy various technical and economical key figures characterizing the optimal policy may be calculated by Markov chain simulation.

The MLHMP software implements this (refer to exercises).

Sometimes considered as a separate modeling technique (which it is not).

Done by re-defining rewards and outputs and solving a set of linear equations.

Slide 36

Herd constraints Optimization

Biologicalvariation

Uncertainty

Functionallimitations

Dynamics

Dynamic programming

Properties of methods for decision support

Documents

Graphical representation of MDPs · 2012. 11. 6. · 3 Slide 13 Age dependency of milk yield 4400 4600 4800 5000 5200 5400 5600 5800 6000 1st Parity 2nd Parity 3rd Parity 4th Parity