Upload
harriet-knight
View
212
Download
0
Embed Size (px)
Citation preview
Pattern Matching in DAME using AURA technology
Jim Austin, Robert Davis, Bojian Liang, Andy Pasley
University of York
Distributed Aircraft Maintenance Environment - DAME
Overview
• Context• AURA technology• DAME pattern matching problem• AURA solution• Search performance• Next steps
Distributed Aircraft Maintenance Environment - DAME
Context
• Vibration data from all engines in flight• Detection of unusual vibration patterns
– Novelties, anomalies– Automatic or manual
Search for similar vibration behaviour– Need to search large volumes of historical vibration
data
• Investigate search results and associated data– Service data records– CBR tools: Sheffield
Distributed Aircraft Maintenance Environment - DAME
AURA technology
• AURA– Proven technology for searching large data sets– Ability to scale and maintain performance– Easily parallelised
• Examples– Address matcher– Molecular matcher
• Operation– Vectors compared to stored examples– Uses bit level comparison methods– Correlation Matrix Memory operations
Distributed Aircraft Maintenance Environment - DAME
AURA architecture
Dat
a A
dapt
or
Sto
reS
ea
rch
Inp
ut
pa
tte
rn
Candidate Engine(Back check)
Indexer
Output pattern
AURASearchEngine
Results
binary
Store & Search
Store &
Search
Indexes or Data
ResultStore
Candidate Selector
Distributed Aircraft Maintenance Environment - DAME
AURA storage & recall
Inp
ut
pa
tte
rn
Output pattern
AURASearchEngine
binary
2 1 20 0 0 0
* *Correlation Matrix Memories
Distributed Aircraft Maintenance Environment - DAME
AURA software
• AURA re-designed– To improve performance of the AURA library in terms of
both memory usage and search times• 3 fold reduction in memory
• 3 fold reduction in search time
– To make the library easy to use• Simple API
• Typically only 4 or 5 API calls used
• Enable implementation as an OGSI GT3 service
– To engineer the library to commercial software standards• Comprehensive user guide and reference manual
Distributed Aircraft Maintenance Environment - DAME
Pattern matching problem
• Vibration data from sensors forms Z-mod data.• Tracked orders extracted from Z-mod data
Fre
quen
cy
Time
Trackedorder
TimeA
mpl
itude
Distributed Aircraft Maintenance Environment - DAME
Pattern matching problem
• Novelty or anomaly identified in tracked order data by feature detectors
Forms Query sub-sequence
Distributed Aircraft Maintenance Environment - DAME
Pattern matching problem
• Search for sub-sequences similar to the query in a large volume of tracked order data.– Need to investigate all possible alignments– Benchmark method is sequential scan– Noisy data: imprecise matching required– Various possible similarity measures
• Euclidian distance
• Correlation
Distributed Aircraft Maintenance Environment - DAME
AURA solution
StoredTime series
AURA SearchEngine
Results
EncodedQuery
QueryTime Series
AURABackcheck
Encoded Time Series
Candidate Matches
Distributed Aircraft Maintenance Environment - DAME
AURA solution
• Encoding: reduction in dimensionality – e.g. from 100pts to 10 values.
• Approximate search– From ~ 1,000,000s of alignments down to ~1000s of
candidate matches
• Backcheck– From ~1000s candidate matches to 100 or fewer results
Distributed Aircraft Maintenance Environment - DAME
Encoding technique
• Piecewise Aggregate Approximation• Values encoded using integer bins
Y-A
xis
X-Axis
Distributed Aircraft Maintenance Environment - DAME
Search efficiency
• Approximate search using AURA– Fast method of discarding poor matches– AURA search typically an order of magnitude or more faster
than sequential scan. – Candidate matches typically <1% of total.– Back check stage very efficient due to reduction in volume
of data• typically 1% or less of processing time for full sequential scan.
Distributed Aircraft Maintenance Environment - DAME
Data size
• Assume– Fleet of 100 aircraft, 4 engines each– Flying 10 hours per day– 5 data points per tracked order per second – 4 bytes per data point
• Totals– approx. 100 GigaBytes per year per tracked order– Roughly 10 tracked orders of interest so…
• Total approx. 1 TeraByte per year
Distributed Aircraft Maintenance Environment - DAME
Search performance
• Deployed system assumptions– 100 CPUs 2GHz each with 1GByte RAM.
• One per aircraft
– Each search needs to check 25,000,000,000 alignments of the query per year of tracked order data.
• Sequential scan– Measured at approx. 2 seconds for 5,000,000 alignments of
a 100 data point query (one CPU).– Extrapolates to approx. 500 seconds to search 5 years of
data assuming 1 CPU per aircraft
– This is too slow! Need to support multiple searches and searches on more than one tracked order.
Distributed Aircraft Maintenance Environment - DAME
Search performance
• Using AURA and PAA based approach– Search time reduced by approx an order of magnitude.
– Can search 5 years of data for 100 aircraft in approx:
50 seconds
– Believe this to be a workable solution – But response times potentially slower than this
• Need to handle a number of searches in parallel
• Communications and other overheads
Distributed Aircraft Maintenance Environment - DAME
Next steps
• Technology– Refine similarity measures and encoding methods.
• Architecture– Develop additional services to distribute and organise the
search– Support multiple searches in parallel
• Measurement– Perform scaling trials on engine data– Obtain better estimates of overall performance
• Multiple searches
• Overheads