Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Slide 1
Hierarchical Markov decision processes
Anders Ringgaard Kristensen
Slide 2
Outline
Graphical representation of models
Markov property
Hierarchical models
Multi-level models
Decisions on multiple time scale
Markov chain simulation
Slide 3
Difficulties when modeling
The curse of dimensionality
• (Multi-level) Hierarchical processes
Decisions on multiple time scales
• (Multi-level) Hierarchical processes
State space representation
• Decision graphs – Discussed later
Several herd constraints
• Inherit problem of the method
• Parameter iteration
The Markov property
• Memory variables
• Bayesian
Slide 4
Graphical representation of MDPs
Recall the structure of the simple dairy cow replacement model:
Stage:
• 1 lactation cycle
State:
• i=1: Low milk yield
• i=2: Average milk yield
• i=3: High milk yield
Action:
• d=1: Keep the cow
• d=2: Replace the cow at the end of the stage
The structure may be displayed graphically in two different ways:
• As a model tree
• As a decision graph
We will model 10 stages (finite horizon)
Slide 5
The model displayed as a tree
We have a nested structure:
• The root of the model is the process itself
• The process holds 10 stages (the time horizon)
• Each stage holds 3 states (Low, Average, High)
• Each state holds 2 actions (Keep, Replace)
The parameters:
• Each action has a set of parameters attached:
• A reward
• A probability distribution (to the states at next stage).
Implemented in the MLHMP software system
Slide 6
The MLHMP software – the model tree window
2
Slide 7
Summary of the model tree
The nested structure of an MDP is shown directly
Each value (stage, state and action) is displayed as an icon of
a certain type.
A label is (optionally) attached to each value in order to ease
the (human) interpretation of the values.
Asymmetric models are easily handled (and displayed)
Slide 8
The model displayed as a Decision Graph
A Decision Graph consists of variables and directed edges
connecting them:
• A variable is displayed as a circle
• A directed edge is displayed as an arc
In our small example we have basically 3 variables at each
stage:
• The state (random)
• The action (decision)
• The reward (utility)
Slide 9
The Esthauge LIMID software system: The Net window
For a model like this – the DG is rather boring …
Slide 10
Summary of the Decision Graph representation
The causal structure of the model is shown explicitly as directed
edges (arcs).
Each variable is displayed as a node (“circle”) of a certain
kind:
• Chance node (yellow) – state variable
• Decision nodes (green) – actions
• Utility nodes (purple) – rewards
Each variable has a number of values.
Labels may (optionally) be used in order to ease the (human)
interpretation of the variables.
Asymmetric models are difficult to display (and handle)
Slide 11
The Markov property
Let in be the state at stage n
The Markov property is satisfied if, and only if,
• P(in+1| in, in-1, … , i1) = P (in+1| in)
• In words: The distribution of the state at next stage depends only on the present state – previous states are not relevant.
This property is crucial in Markov decision processes.
Slide 12
The Markov property – what does it mean?
Does it mean that the state at stage n+k is independent of the
state at stage n for n > 1?
Let us use the DAG to test:
Conditional independence!
Understanding the Markov property is crucial for understanding
MDPs
We shall come back to the Markov property several times.
3
Slide 13
Age dependency of milk yield
4400
4600
4800
5000
5200
5400
5600
5800
6000
1st
Parity
2nd
Parity
3rd
Parity
4th
Parity
Kg ECM
Slide 14
An extended model, I
State variables
• Age
• Parity 1
• Parity 2
• Parity 3
• Parity 4
• Relative milk yield
• Low
• Average
• High
Slide 15
An extended model, II
Slide 16
An extended model, III
Slide 17
An extended model, IV
Slide 18
Let us take a look at the model tree
4
Slide 19
Age and genotype dependency
0
1000
2000
3000
4000
5000
6000
7000
Par. 1 Par. 2 Par. 3 Par. 4
Low genetic
merit
Average genetic
merit
High genetic
merit
Slide 20
A further extended model
State variables
• Genetic merit:
• Low,
• Average,
• High
• Age:
• Parity 1,
• Parity 2,
• Parity 3,
• Parity 4
• (Relative) milk yield:
• Low,
• Average,
• High
Slide 21
Rewards and output
Slide 22
Transition probabilities, Keep
Slide 23
Transition probabilities, Replace
Slide 24
We shall again take a look at the graphical display
5
Slide 25
An example: Houben et al. (1994)
State variables:
• Age (monthly intervals, 204 levels)
• Milk yield, present lactation (15 levels)
• Milk yield, previous lactation (15 levels)
• Length of calving interval (8 levels)
• Mastitis, present lactation (4 levels)
• Mastitis, previous lactation (4 levels)
• Clinical mastitis (yes/no)
Total state space 6,821,724 states
Houben, E. P. H., R. B. M. Huirne, A. A. Dijkhuizen & A. R. Kristensen. 1994. Optimal replacement of mastitis cows determined by a hierarchic Markov process. Journal of
Dairy Science 77, 2975-2993.
Slide 26
The curse of dimensionality
If
• state variables are represented at a realistic number of levels
• all relevant state variables are included in the model
then
• the state space grows to prohibitive dimensions
Solution:
• Hierarchical models
Slide 27
Most elements are zero because
• Age is included as a state variable
• Some state variables are constant within animal
• Some state variables are constant over several stages
If state numbers are defined appropriately the non-zero elements are arranged in a certain pattern
This can be utilized for a hierarchical organisation of the state space!
Important observations, transition matrix
Slide 28
Illustration of the hierarchy for the example
Founder
Child
Cow 1 Cow 2 Cow 3
1 2 3 4 1 2 3 4 1 2
Genetic merit
Relative milk yield
Keep/Replace
Dummy (no action)
Optimization technique
• Policy iteration in the founder process (exact)
• Value iteration in the child processes (exact)
The positive properties of both techniques are combined into a very efficient and exact hierarchic technique
Slide 29
The dairy cow replacement model as a hierarchical process
Founder process:
• Stage: Life time of a cow
• State: Genetic merit
• Action: Dummy
Child process:
• Stage: A lactation cycle
• State: Milk yield (relative to genetic merit and lactation)
• Action: Keep/Replace
Benefits:
• The age of the cow is known from the child level stage
• The size of the transition matrices are reduced to 3 x 3 (as compared to 36 x 36 in the original model)
Slide 30
Multi-level processes
The hierarchy may be extended to several levels
Actions may be defined at all levels making simultaneous optimization of decisions with different time horizons possible.
Curse of dimensionality circumvented
Simultaneous optimization of decisions at different levels (time horizon)
6
Slide 31
Example: Dimensionality
State variables in original model (van Arendonk 1985):• Age (months) (1-144)
• Milk yield, previous lactation (1-15)
• Milk yield, present lactation (1-15)
Total number of states: 29,880
Stage length: 1 month
Matrix dimension 29,880 x 29,880
Slide 32
As a 2 or 3-level process
Slide 33
Model tree of a hierarchical MDP
In any action of an MDP, the choice of action influences:
• The immediate reward
• The probability distribution of the state at next stage
In a hierarchical model the action (of a parent process) is
modeled as a separate embedded finite time MDP (a child
process):
• The reward is the expected sum of total rewards of the child
• The transition probability distribution is calculated as the matrix
product of all transition matrices of the child process.
The action of a parent process is an ordinary action of which
the reward and transition probabilities are calculated in a
special way (from the child process).
In the model tree we just ad a child process to the action!
Slide 34
Model tree of a hierarchical MDP
Slide 35
Markov chain simulation
As a supplement to the optimal policy various technical and economical key figures characterizing the optimal policy may be calculated by Markov chain simulation.
The MLHMP software implements this (refer to exercises).
Sometimes considered as a separate modeling technique (which it is not).
Done by re-defining rewards and outputs and solving a set of linear equations.
Slide 36
Herd constraints Optimization
Biologicalvariation
Uncertainty
Functionallimitations
Dynamics
Dynamic programming
Properties of methods for decision support