View
15
Download
0
Category
Preview:
DESCRIPTION
Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series. X Liu, S Swift & A Tucker Department of Computer Science Birkbeck College University of London. MTS Applications at Birkbeck. Screening Forecasting Explanation. Forecasting. - PowerPoint PPT Presentation
Citation preview
Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series
X Liu, S Swift & A TuckerX Liu, S Swift & A TuckerDepartment of Computer ScienceDepartment of Computer Science
Birkbeck CollegeBirkbeck College
University of LondonUniversity of London
MTS Applications at Birkbeck
ScreeningScreening
ForecastingForecasting
ExplanationExplanation
Forecasting
Predicting Visual Field Deterioration of Predicting Visual Field Deterioration of Glaucoma PatientsGlaucoma Patients
Function Prediction for Novel Proteins from Function Prediction for Novel Proteins from Multiple Sequence/Structure DataMultiple Sequence/Structure Data
Explanation
Input (observations):
t - 0 : Tail Gas Flow in_state 0t - 3 : Reboiler Temperature in_state 1
Output (explanation):
t - 7 : Top Temperature in_state 0 with probability=0.92t - 54 : Feed Rate in_state 1 with probability=0.71t - 75 : Reactor Temperature in_state 0 with probability=0.65
The Gaps
ScreeningScreening Automatic / Semi- Automatic Analysis of Automatic / Semi- Automatic Analysis of
OutliersOutliers ForecastingForecasting
Analysing Short Multivariate Time SeriesAnalysing Short Multivariate Time Series ExplanationExplanation
Coping with Huge Search SpacesCoping with Huge Search Spaces
The Problem - What/Why/How Short-Term Forecasting of Visual Field Progression Using a Statistical MTS Model The Vector Auto-Regressive Process - VAR(P) There Could be Problems if the MTS is Short A Modified Genetic Algorithm (GA) can be Used VARGA
The Prediction of Visual Field Deterioration Plays anImportant Role in the Management of the Condition
Background - The Dataset
The interval between testsis about 6 months
Typically, 76 pointsare measured
The number of tests canrange between 10 and 44
xPoints used in this paper (Right Eye)
Usual Position of Blind Spot (Right Eye)
x
Values Range Between60 =very good, 0 = blind
76 75 18 19
74 73
71
15 16 17
70 69 68
67 66 65
11 12 13 14
64 63
72
6 7 8 9 10
62 61 60 59 58 1 2 3 4 5
43 42 41 40 39 20 21 22 23 24
48 47 46 45 44 25 26 27 28 29
52 51 50 49 30 31 32 33
55 54 53 34 35 36
57 56 37 38
Background - The VAR ProcessVector Auto-Regressive Process of Order P: VAR(P)
x(t) VF Test for Data Points at Time t (K1)Ai Parameter Matrix at Lag i (KK)x(t-i) VF Test for Data Points at lag i from t (K1) (t) Observational Noise at time t (K1)
The Genetic Algorithm
Generate a Generate a PopulationPopulation of random of random ChromosomesChromosomes (Solutions)(Solutions)
Repeat for a number of Repeat for a number of GenerationsGenerations
Cross OverCross Over the current Population the current Population
MutateMutate the current the current PopulationPopulation
Select the Select the FittestFittest for the next Population for the next Population
LoopLoop
The best solution to the problem is the Chromosome inThe best solution to the problem is the Chromosome inthe last generation which has the highest the last generation which has the highest FitnessFitness
“A Search/Optimisation method that solves a problem
through maintaining and improving a population of
suitable candidate solutions using biological metaphors”
GAs - Chromosome Example
X
0-1270000000-1111111
Y
0-3100000-11111
0000000.00000-1111111.11111
GAs - Mutation
Each Bit (gene) of a Chromosome is Given Each Bit (gene) of a Chromosome is Given a Chance MP of invertinga Chance MP of inverting
A ‘1’ becomes a ‘0’, and a ‘0’ becomes a 1’A ‘1’ becomes a ‘0’, and a ‘0’ becomes a 1’
01101101
These Ones!
00101111
GAs - Crossover (2)
01011101 11101010AA BB
X=4X=4
01011010
CC DD11101101
VARGA - Representation
Chromosome
a111 … …
… a1ij …
… … a1KK
A1 A2 Am Ap
... ...a211 … …
… a2ij …
… … a1KK
am11 … …
… amij …
… … amKK
ap11 … …
… apij …
… … apKK
VARGA - The Genetic Algorithm GA With Extra Mutation Order Mutation After Gene Mutation Parents and Children Mutate (Both) Genes are Bound Natural Numbers Fitness is -ve Forecast Error Minimisation Problem - Roulette Wheel Run for EACH Patient
Evaluation - Methods for Comparison
SPlus: Yule Walker Equations, AIC and Whittles Recursion, NK(P+1), Standard Package Holt-Winters Univariate Forecasting Method, Is the Data Univariate? (GA Solution) Pure Noise Model, VAR(0), Worst Case Forecast, (Non-Differenced = 0) 54 out of the Possible 82 Patients VF Records Could not be Used : SPlus Implementation
Results - Graph Comparison
Scores for Cases 0 to 6
0
500
1000
1500
2000
0 1 2 3 4 5 6
Case Number
Score
HW
S-Plus
VARGA
Noise
The Lower the Score - the Better Score is the One Step Ahead Forecast Error
Results - Table Summary
Average = The Average One Step Forecast ErrorFor the 28 Patients (Both GA’s Fitness)
(The Lower - The Better)
Method Order(number of order)
AverageScore
VARGA 26 of 1, 2 of 2 559.82S-Plus 12 of 0, 14 of 1, 1 of 2, 1 of 3 616.12HW N/A 683.79
Noise 28 of 0 816.53
Conclusion - Results
VARGA Has a Better Performance VARGA Can Model Short MTS The Visual Field Data is Definitely Multivariate Data Has a High Proportion of Noise
Conclusion - Remarks
Non-Linear Methods and Transformations Performance Enhancements for the GA Improve Crossover Irregularly Spaced Methods Space-Time Series Methods Time Dependant Relationships Between Variables
Generating Explanations in MTS
Useful to know probable explanations for a Useful to know probable explanations for a given set of observations within a time series given set of observations within a time series
E.g. Oil Refinery: ‘Why a temperature has E.g. Oil Refinery: ‘Why a temperature has become high whilst a pressure has fallen below become high whilst a pressure has fallen below a certain value?’a certain value?’
Possible paradigm which facilitates Explanation Possible paradigm which facilitates Explanation is the Bayesian Networkis the Bayesian Network
Evolutionary Methods to learn BNsEvolutionary Methods to learn BNs Extend work to Dynamic Bayesian NetworksExtend work to Dynamic Bayesian Networks
Dynamic Bayesian Networks Static BNs repeated over t time slicesStatic BNs repeated over t time slices Contemporaneous / Non-Contemporaneous LinksContemporaneous / Non-Contemporaneous Links Used for Prediction / Diagnosis within dynamic Used for Prediction / Diagnosis within dynamic
systemssystems
n
iiin XPXXP
11 )|()...(
Assume all variables take at least one time slice to Assume all variables take at least one time slice to impose an effect on another.impose an effect on another.
The more frequently a system generates data, the The more frequently a system generates data, the more likely this will be true.more likely this will be true.
Contemporaneous Links can be excluded from the Contemporaneous Links can be excluded from the DBNDBN
Each variable at time, t, will be considered Each variable at time, t, will be considered independent of one anotherindependent of one another
Assumptions - 1
Representation P pairs of the form (ParentVar, TimeLag)P pairs of the form (ParentVar, TimeLag) Each pair represents a link from a node at a previous time Each pair represents a link from a node at a previous time
slice to the node in question at time t.slice to the node in question at time t.
Examples :Variable 1: { (1,1); (2,2); (0,3)}Variable 4: { (4,1); (2,5)}
Search Space
Given the first assumption and proposed Given the first assumption and proposed representation the Search Space for each representation the Search Space for each variable will be:variable will be:
MaxLagN2
Structure Search : Evolutionary Algorithms, Hill Climbing etc.
Parameter Calculation given structure
Dynamic Bayesian Network Library for Different Operating States
MultivariateTime Series
Explanation Algorithm (e.g. using Stochastic
Simulation)User
Algorithm
Generating Synthetic Data
(1)
(2)
Oil Refinery Data
Data recorded every minuteData recorded every minute Hundreds of variablesHundreds of variables Selected 11 interrelated variablesSelected 11 interrelated variables Discretised each variable into k statesDiscretised each variable into k states Large Time Lags (up to 120 minutes between Large Time Lags (up to 120 minutes between
some variables)some variables) Different Operating StatesDifferent Operating States
ResultsSOT
FF
TGF
TT
RinT
Explanations - using Stochastic Simulation
Explanations - using Stochastic Simulation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97time-x
P(y
=1)
SOF-SPSOTTTBPF-SPBPF
Explanation
Input (observations):
t - 0 : Tail Gas Flow in_state 0t - 3 : Reboiler Temperature in_state 1
Output (explanation):
t - 7 : Top Temperature in_state 0 with probability=0.92t - 54 : Feed Rate in_state 1 with probability=0.71t - 75 : Reactor Temperature in_state 0 with probability=0.65
Future Work
Exploring the use of different searches and metricsExploring the use of different searches and metrics Improving accuracy Improving accuracy
(e.g. different discretisation policies, continuous (e.g. different discretisation policies, continuous DBNs)DBNs)
Using the library of DBNs in order to quickly Using the library of DBNs in order to quickly classify the current state of a systemclassify the current state of a system
Automatically Detecting Changing Dependency Automatically Detecting Changing Dependency StructureStructure
Acknowledgements
BBSRCBP-AMOCOBritish Council for Prevention of BlindnessEPSRCHoneywell Hi-Spec SolutionsHoneywell Technology CenterInstitute of OpthalmologyMoorfields Eye HospitalMRC
Intelligent Data Analysis
X LiuX LiuDepartment of Computer ScienceDepartment of Computer Science
Birkbeck CollegeBirkbeck College
University of LondonUniversity of London
Intelligent Data Analysis
An interdisciplinary study concerned with An interdisciplinary study concerned with effective analysis of dataeffective analysis of data
Intelligent application of data analytic Intelligent application of data analytic toolstools
Application of “intelligent” data analytic Application of “intelligent” data analytic toolstools
IDA Requires
Careful thinking at every stage of an Careful thinking at every stage of an analysis process (strategic aspects)analysis process (strategic aspects)
Intelligent application of relevant domain Intelligent application of relevant domain knowledgeknowledge
Assessment and selection of appropriate Assessment and selection of appropriate analysis methodsanalysis methods
IDA Conferences
IDA-95, Baden-BadenIDA-95, Baden-Baden IDA-97, LondonIDA-97, London IDA-99, AmsterdamIDA-99, Amsterdam IDA-2001, LisbonIDA-2001, Lisbon
IDA in Medicine and Pharmacology
IDAMAP-96, BudapestIDAMAP-96, Budapest IDAMAP-97, NagoyaIDAMAP-97, Nagoya IDAMAP-98, BrightonIDAMAP-98, Brighton IDAMAP-99, Washington DCIDAMAP-99, Washington DC IDAMAP-2000, BerlinIDAMAP-2000, Berlin
Other IDA Activities
IDA Journal (Elsevier 1997)IDA Journal (Elsevier 1997) Journal Special Issues (1997 -)Journal Special Issues (1997 -) Introductory Books (Springer 1999)Introductory Books (Springer 1999) The Dagstuhl Seminar (Germany 2000)The Dagstuhl Seminar (Germany 2000) European Summer School (Italy 2000)European Summer School (Italy 2000) Special Sessions at ConferencesSpecial Sessions at Conferences
Concluding Remarks
Strategies for data analysis and miningStrategies for data analysis and mining Strategies for human-computer Strategies for human-computer
collaboration in IDAcollaboration in IDA Principles for exploring and analysing “big Principles for exploring and analysing “big
data”data” Benchmarking interesting real-world data-Benchmarking interesting real-world data-
sets as well as computational methodssets as well as computational methods A long term interdisciplinary effortA long term interdisciplinary effort
The Screening Architecture
Results from a GP Clinic
Recommended