Upload
fai
View
58
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The Automatic Explanation of Multivariate Time Series (MTS). Allan Tucker. The Problem - Data. Datasets which are Characteristically: High Dimensional MTS Large Time Lags Changing Dependencies Little or No Available Expert Knowledge. The Problem - Requirement. - PowerPoint PPT Presentation
Citation preview
The Automatic Explanation of Multivariate Time Series (MTS)
Allan Tucker
The Problem - Data
• Datasets which are Characteristically:– High Dimensional MTS – Large Time Lags– Changing Dependencies– Little or No Available Expert Knowledge
• Lack of Algorithms to Assist Users in Explaining Events where:– Model Complex MTS Data– Learnable from Data with Little or No User
Intervention– Transparency Throughout the Learning and
Explaining Process is Vital
The Problem - Requirement
Contribution to Knowledge
• Using a Combination of Evolutionary Programming (EP) and Bayesian Networks (BNs) to Overcome Issues Outlined
• Extending Learning Algorithms for BNs to Dynamic Bayesian Networks (DBNs) with Comparison of Efficiency
• Introduction of an Algorithm for Decomposing High Dimensional MTS into Several Lower Dimensional MTS
Contribution to Knowledge (Continued)
• Introduction of New EP-Seeded GA Algorithm
• Incorporating Changing Dependencies
• Application to Synthetic and Real-World Chemical Process Data
• Transparency Retained Throughout Each Stage
Real Data
Data Preparation
Search Methods
Variable Groupings
Synthetic Data
Explanation
Model Building
Evaluation
Changing Dependencies
Framework Pre-processing
Key Technical Points 1Comparing Adapted Algorithms
• New Representation• K2/K3 [Cooper and Herskovitz]• Genetic Algorithm [Larranaga]• Evolutionary Algorithm [Wong]• Branch and Bound [Bouckaert]• Log Likelihood / Description Length• Publications:
– International Journal of Intelligent Systems, 2001
Key Technical Points 2Grouping
• A Number of Correlation Searches• A Number of Grouping Algorithms• Designed Metrics• Comparison of All Combinations• Synthetic and Real Data• Publications:
– IDA99– IEEE Trans System Man and Cybernetics 2001– Expert Systems 2000
Key Technical Points 3EP-Seeded GA
• Approximate Correlation Search Based on the One Used in Grouping Strategy
• Results Used to Seed Initial Population of GA• Uniform Crossover• Specific Lag Mutation• Publications:
– Genetic Algorithms and Evolutionary Computation Conference 1999 (GECCO99)
– International Journal of Intelligent Systems, 2001– IDA2001
Key Technical Points 4Changing Dependencies
• Dynamic Cross Correlation Function for Analysing MTS
• Extend Representation Introduce a Heuristic Search - Hidden Controller Hill Climb (HCHC)– Hidden Variables to Model State of the System– Search for Structure and Hidden States Iteratively
Future Work
• Parameter Estimation
• Discretisation
• Changing Dependencies
• Efficiency
• New Datasets – Gene Expression Data– Visual Field Data
DBN Representation
t-4 t-3 t-2 t-1 t
a0(t)
a1(t)
a2(t)
a3(t)
a4(t)
a2(t-2)
a3(t-2)
a4(t-3)
a3(t-4)
(3,1,4)(4,2,3)(2,3,2)(3,0,2)(3,4,2)
Sample DBN Search Results
5000
6000
7000
8000
9000
10000
11000
0 5000 10000 15000
Function Calls
Des
crip
tio
n L
eng
th
K3
EP
GA
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 200 400 600 800 1000
Function Calls
Des
crip
tio
n L
eng
th
K3
EP
GA
N = 5, MaxT = 10 N = 10, MaxT = 60
Grouping
One High Dimensional
MTS (A)
1. Correlation Search (EP)
2. GroupingAlgorithm (GGA)
Several Lower Dimensional
MTS
List
(a, b, lag)(a, b, lag)
(a, b, lag)
12
R
G{0,3}
{1,4,5}{2}
Sample Grouping Results
0 1 2
3 4 5 6 7
8 9 10 11 12
13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 3233 34 35 36 37 38 39 40 41 4243 44 45 46 47 48 4950 51 52 53 54 5556 57 5859 60
0 6123 4 5 789 1011 121314 15 20 21 2216 17 18 1923 24 25 26 27 28 29 30 31 3233 34 35 36 37 38 39 40 41 4243 44 45 46 47 48 4950 51 52 53 54 5556 57 5859 60
Original Synthetic MTS Groupings
Groupings Discoveredfrom Synthetic Data
25
27
29
31
33
35
37
39
41
43
45
1 501 1001 1501 2001 2501 3001
Time
Mag
nit
ud
e (T
emp
etc
)
10
12
14
16
18
20
22
39
11
15
Sample of Variables from a Discovered Oil Refinery Data Group
Parameter Estimation• Simulate Random Bag (Vary R, s and c, e)• Calculate Mean and SD for Each Distribution (the
Probability of Selecting e from s)• Test for Normality (Lilliefors’ Test)• Symbolic Regression (GP) to Determine the Function
for Mean and SD from R, s and c (e will be Unknown)
• Place Confidence Limits on the P(Number of Correlations Found e)
0: (a,b,l) 1: (a,b,l) 2: (a,b,l)
EPListSize: (a,b,l)
Final EPList
EP
0: ((a,b,l),(a,b,l)…(a,b,l))1: ((a,b,l),(a,b,l)…(a,b,l))2: ((a,b,l),(a,b,l)…(a,b,l))
GAPopsize: ((a,b,l) … (a,b,l))
GA
Initial GAPopulationDBN
EP-Seeded GA
EP-Seeded GA Results
6000
6500
7000
7500
8000
8500
9000
9500
10000
10500
0 500 1000 1500 2000 2500 3000
Function Calls
Des
crip
tio
n L
eng
th EP
EP Seeded GA(c=20%)
EP Seeded GA(c=100%)
N = 10, MaxT = 60 N = 20, MaxT = 60
14000
15000
16000
17000
18000
19000
20000
21000
0 1000 2000 3000 4000
Function Calls
Des
crip
tio
n L
eng
th EP
EP Seeded GA(c=100%)
EP Seeded GA(c=20%)
Varying the value of c
-7000
-6500
-6000
-5500
-5000
-4500
-4000
0 1000 2000 3000 4000 5000
Function Calls
Lo
g L
ikel
iho
od
c=10%
c=20%
c=30%
c=50%
c=70%
c=100%
EP
P(TGF instate_0) = 1.0t
t-1
t-11
t-13
t-16
t-20
t-60
P(TT instate_0) = 1.0 P(BPF instate_3) = 1.0
P(TT instate_1) = 0.446
P(TGF instate_3) = 1.0
P(SOT instate_0) = 0.314
P(C2% instate_0) = 0.279
P(T6T instate_0) = 0.347
P(RinT instate_0) = 0.565
Time Explanation
Changing Dependencies
20
25
30
35
40
45
50
1 501 1001 1501 2001 2501 3001 3501
Time (Minutes)
Var
iab
le M
agn
itu
de
7
7.5
8
8.5
9
9.5
10
10.5
A/M_GB
TGF
20
40
60
80
100
120
140
1 501 1001 1501 2001 2501 3001 3501 4001 4501 5001
Time (Minutes)
Var
iab
le M
agn
itu
de
7
7.5
8
8.5
9
9.5
10
10.5
A/M_GB
TGF
Dynamic Cross- Correlation
Function
1 6
11 16
21
26
31
36
41
46
51
56
61
S1
S5
S9
S13
S17
S21
S25
S29
S33
S37
S41
S45
S49
S53
S57
S61
S65
S69
S73
S77
S81
S85
Time Lag
Win
do
w P
os
ition
0.3-0.4
0.2-0.3
0.1-0.2
0-0.1
-0.1-0
-0.2--0.1
-0.3--0.2
Hidden Variable - OpState
t-4 t-3 t-2 t-1 t
a2(t) OpState2a2(t-1)
a3(t-2)
a0(t-4)
Hidden Controller Hill Climb
Update Segment_Lists through Op_State Parameter
Estimation
Update DBN_List through DBN Structure
Search
< DBN_List > < Segment_Lists >
Score
HCHC Results - Oil Refinery Data
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80
Window Position
Mo
st S
ign
ific
ant
Co
rrel
atio
n
A/M_GB
SOT
T6T
Segments
HCHC Results - Synthetic Data
MTS 1 MTS 2 MTS 3Number of Original Links 12 26 16Spurious Links 2.3 2.9 4.0Implicit Links 2.3 1.0 0.4Missed Links 1.0 2.8 1.4Total SD 5.6 6.7 5.8Original Segmentation Length 1000 1000 500Segmentation Error 15.89 16.08 14.157Missed Segmentations 0.6 0.0 1.2Spurious Segmentation 0.9 0.5 0.8
Generate Data from Several DBNsAppend each Section of Data Together to Form One MTS with Changing DependenciesRun HCHC
t
t-1
t-3
t-5
t-6
t-9
Time Explanation
P(OpState1 is 0) = 1.0 P(a1 is 0) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0
P(OpState1 is 0) = 1.0 P(a1 is 1) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0
P(a2 is 0) = 0.758
P(a2 is 0) = 0.545
P(a0 is 0) = 0.968
P(a0 is 1) = 0.517
P(OpState0 is 0) = 0.519
P(a0 is 1) = 0.778P(OpState0 is 0) = 0.720
t
t-1
t-3
t-5
t-6
t-7
t-9
Time Explanation
P(OpState1 is 4) = 1.0 P(a1 is 0) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0
P(OpState1 is 4) = 1.0 P(a1 is 1) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0
P(a2 is 1) = 0.570
P(a2 is 1) = 0.974
P(a0 is 0) = 0.506
P(a0 is 1) = 0.549
P(OpState2 is 3) = 0.210
P(a2 is 0) = 0.882P(OpState2 is 4) = 0.222
Process Diagram
TT
T6T
T36T
RBT
SOTT11SOFT13
TGF
BPF
%C3
%C2
RINT
FF
PGM
PGB
AFT
C11/3T
Typical Discovered Relationships
TT
T6T
T36T
RBT
SOTT11
SOFT13
AFT
TGF
BPF
%C3
%C2
RINT
FF
C11/3T
PGM
PGB
ParametersDBN Search GA EP
PopSize 100 10MR 0.1 0.8CR 0.8 ---Gen Based on FC Based on FC
Correlation Searchc - Approx. 20% of sR - Approx. 2.5% of s
Grouping GA Synth. 1 Synth. 2-6 Oil
PopSize 150 100 150CR 0.8 0.8 0.8MR 0.1 0.1 0.1Gen 150 100 (1000 for GPV) 150
ParametersEP-Seeded GAc - Approx. 20% of sEPListSize - Approx. 2.5% of s GAPopSize - 10MR - 0.1CR - 0.8LMR - 0.1Gen - Based on FC
HCHCOil Synthetic
DBN_Iterations 1×106 5000Winlen 1000 200Winjump 500 50