Extending Evolutionary Programming to the Learning of Dynamic Bayesian Networks Allan Tucker Xiaohui Liu Birkbeck College University of London

Extending Evolutionary Programming to the Learning of

Dynamic Bayesian Networks

Allan Tucker

Xiaohui Liu

Birkbeck College

University of London

Diagnosis in MTS• Useful to know causes for a given set of

observations within a time series • E.g. Oil Refinery: ‘Why a temperature has become

high whilst a pressure has fallen below a certain value?’

• Possible paradigm to perform Diagnosis is the Bayesian Network

• Evolutionary Methods to learn BNs• Extend work to Dynamic Bayesian Networks

Dynamic Bayesian Networks

• Static BNs repeated over t time slices

• Contemporaneous / Non-Contemporaneous Links

• Used for Prediction / Diagnosis within dynamic systems

Assumptions - 1

• Assume all variables take at least one time slice to impose an effect on another.

• The more frequently a system generates data, the more likely this will be true (e.g. every minute, second etc.)

• Contemporaneous Links are excluded from the DBN.

Representation

• N variables at time slice, t

• P Triples of the form (x,y,lag)

• Each triple represents a link from a node at a previous time slice to a node at time t.

Example: { (0,0,1); (1,1,1); (2,2,1); (3,3,1); (4,4,1); (0,1,3); (2,1,2); (2,4,5); (4,3,4) }

Search Space• Given the first assumption and proposed

representation Search Space will be:

• E.g. 10 variables, MaxLag = 30

• Make further assumptions to reduce this and speed up the search

MaxLagN 2

2

30002

Assumptions - 2

• Cross-Description-Length Function (CDLF)

• Exhibits smoothness of Cross Correlation Function (CCF) cousin

• Exploit this smoothness using Swap

300

310

320

330

340

350

360

370

380

390

400

1 11 21 31 41 51

Lag

DL

DL of link(4,6) overdiffering lags

Assumptions - 3

• Auto-Description-Length Function (ADLF) exhibits the lowest DL with time lag=1

• Automatically insert these links before evaluation 124000

125000

126000

127000

128000

129000

130000

131000

132000

133000

134000

0 20 40 60 80

Lag

DL

ADLF forvariable 2

Evolutionary Programmingto find Links with Low Description

Length

Evolutionary Program (Swap) to find Dynamic Bayesian Network

with Low Description Length

Dynamic Bayesian Network

MultivariateTime Series

Explanation Algorithm(using Stochastic

Simulation)User

EP to Find low DL links

• Using an EP approach with self adapting parameters to find a good selection of links with low DL (High Mutual Information)

• Representation: Each individual is a triple (x,y,lag)

• Fitness is DL of triple

• Solution is the resultant population

),0( iii Nxx

)exp( iii ss

)2

1,0(n

Ns

)2

1,0(

nNsi

Swap Operator

• Select a triple from one parent with equal probability

• Mutate the current lag with a uniform distribution:

• Current Lag + U(-MaxLag/10, MaxLag/10)

(x,y,lag)(x,y,lag)(x,y,lag)(x,y,lag)(x,y,lag)(x,y,lag)(x,y,lag)

(x,y,lag)(x,y,lag)

(x,y,lag)(x,y,lag)(x,y,lag)(x,y,lag)(x,y,lag)(x,y,lag)(x,y,lag)

Lag: 1 MaxLagX

[Lag - MaxLag/10] [Lag + MaxLag/10]

(x,y,lag)

(x,y,lag)(x,y,lag)

(x,y,lag)

(x,y,lag)

Generating Synthetic Data

t-3 t-2 t-1 t

t-3 t-2 t-1 t t+1

(1)

(2)

Oil Refinery Data

• Data recorded every minute• Hundreds of variables• Selected 11 interrelated variables• Discretized each variable into 4 states• Large Time Lags (up to 120 minutes between

some variables)

Parameters

2050080%10%

Population SizeGenerationsOpRate (KGM/Swap)Slide Mutation Rate

EP - DBN Structure Synthetic

30200080%10%

Oil Data

530

Number of VarMaxLag

Data Synthetic

1160

Oil Data

1%2530

Mutation RateLimit (% of all links)ListGenerations

EP - DL Links Synthetic

1%550

Oil Data

Results 1 - Synthetic Data

1700

1800

1900

2000

2100

2200

2300

2400

1 12 23 34 45 56 67 78 89 100

111

122

133

144

155

166

177

188

199

Generation

DL KGM

Swap

Results 2 - Oil Refinery Data

66000

66500

67000

67500

68000

68500

69000

69500

70000

70500

1 101 201 301 401 501 601 701 801 901 1001 1101

Generation

DL KGM

Swap

Results 3

177000

179000

181000

183000

185000

187000

189000

1 9 17 25 33 41 49 57 65 73 81 89 97

Generations

DL

Standard GA

KGM

Swap

Explanations - using stochastic simulation

ExplanationsInput:

t - 0 : Tail Gas Flow in_state 0 Reboiler Temperature in_state 3

Output:

t - 7 : Top Temperature in_state 0t - 54 : Feed Rate in_state 1t - 75 : Reactor Inlet Temperature in_state 0

Future Work

• Exploring the use of different metrics• Improving accuracy

(e.g. different discretization policies, continuous DBNs)

• Learning a library of DBNs in order to classify the current state of a system

Documents

Extending Evolutionary Programming to the Learning of Dynamic Bayesian Networks Allan Tucker Xiaohui Liu Birkbeck College University of London