32
Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Embed Size (px)

Citation preview

Page 1: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Pattern Matching against Distributed Datasets within DAME

Andy Pasley

University of York

Page 2: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Overview

• Distributed Aircraft Maintenance Environment (DAME) project

• Vibration data and search problem• AURA strategy• Architecture and storage• Demonstration using signal data explorer• Future challenges

Page 3: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Project Partners

EPSRC Funded, £3.2 Million, 3 years, commenced Jan 2002.UK pilot project for e-Science4 Universities:

– University of York, Dept of Computer Science– University of Sheffield, Dept of Automatic Control and Systems

Engineering – University of Oxford, Dept of Engineering Science– University of Leeds, School of Computing and School of

Mechanical Engineering

Industrial Partners:– Rolls-Royce– Data Systems and Solutions– Cybula Ltd

Page 4: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Operational Scenario

Engine flight data

Airline office

MaintenanceCentre

London AirportNew York Airport

GRID DiagnosticsCentre

Engine flight data

European data centre

Engine flight data

Airline office

MaintenanceCentre

London AirportNew York Airport

US data centre

GRID DiagnosticsCentre

Engine flight data

Page 5: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

DAME Grid Challenges

Building a demonstration system as proof of concept for Grid technology in the aerospace diagnostic domain

Two primary Grid challenges:– Management of large, distributed and heterogeneous data

repositories– Rapid data mining and analysis of fault data

Other key (commercial) issues:– Remote, secure access to flight data and other operational

data and resources– Management of distributed users and resources– Quality of Service issues (and Service Level Agreements)

Page 6: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Demonstrator

Fully operational system on the WRG– Demonstrated the basic system architecture and main

services

Maintenance Analyst Maintenance Engineer

Toolbench

Page 7: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

White Rose Grid Distribution

Leeds

Grid Middleware Services

Sheffield

Modeling & Decision Support

DAME WRGSign-on Portal

SDMDatabase

CBRAnalysis-GEngineModel-G

GT3 ServiceCBR advisor

GT3 ServiceBD25 Enginemodel wrappedas Grid Service

XTO-G

GT3 ServiceXTO plug-ins via a GridService

DataVisualiser

GT3 ServiceJchart Viewer forviewing XTO output

Workflow

Browser basedworkflow tool.Compliant withResource Broker

Resource BrokerGT2 ServiceSchedule workflowtasks on WRGresource

Oxford

Engine Data Store

Engine DataDatabase

York

Data Mining Services

AURA-GAURA-GDatabase

DataStore-GGT3 ServiceSimulates arrival &storage of QUOTEdata

Zmod Viewer

GT3 ServiceBrowser based dataviewer for zmod files

GT3 ServiceZmod datasearch facility

Collaboration tools

GT3 ServiceToolset for multiusercollaboration

WRGGT3/2

WRGGT3/2

WRGGT3/2

WRGGT3/2

DAME workbench

SecurityGT3 SecurityServiceProxy-Management

DAME GUI

GT3 ServiceBrowser based GUIto DAME services

Page 8: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Vibration data and search problem

Page 9: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Z-mod Data

• Vibration data from sensors forms Z-mod data.• Tracked orders extracted from Z-mod data

Fre

quen

cy

Time

Trackedorder

TimeA

mpl

itude

Page 10: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Pattern Matching Problem

• Collected vibration data from all engines in flight

• Detect unusual events on recent flights– QUICK on wing statistical classifier system

Search for similar events on other engines– Uses AURA pattern matching methods to search large

vibration data sets

• Reason using historical data and search results– CBR tools which access service records

Page 11: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Novelty

• Novelty or anomaly identified in tracked order data by QUICK

Forms ‘query’ sub-sequence

Page 12: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Search Problem

• Search for sub-sequences similar to the query in a large volume of tracked order data.– Need to investigate all possible alignments– Benchmark method is sequential scan– Noisy data: imprecise matching required– Various possible similarity measures

• Euclidian distance

• Correlation

Page 13: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

AURA

• Family of generic techniques for pattern matching using Correlation Matrix Memories (CMMs)

• Proven technology for searching large data sets– Sclable high performance– Find exact and near-matches– Wide range of data types– Can be parallelised

• Operation– Takes vectors and compares them to stored examples– Uses bit level comparison methods and binary matrix

operations.

Page 14: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

AURA Technology

Data

Dat

a A

dapt

orS

tore

Se

arc

h

Inp

ut

pa

tte

rn

Candidate Engine(Back check)

Indexer

Output pattern

AURASearchEngine

Results

binary

Store & Search

Store &

Search

Indexes or Data

ResultStore

AURA Search Process

Page 15: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Data Storage & Recall

Inp

ut

pa

tte

rn

Output pattern

AURASearchEngine

binary

2 1 20 0 0 0

* *Correlation Matrix Memories

Page 16: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

AURA Encoding

• Application specific encoding required for efficient searching– Similarity metric– Integer ‘bins’– Reduction in dimensionality– Can integrate traditional methods

Page 17: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Performance

• Fast method of discarding poor matches• AURA search roughly 30x faster than sequential

scan• Candidate matches typically <1% of total• Back check stage very efficient due to reduction in

volume of data– Typically 1% or less of processing time for full sequential

scan.

Page 18: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Architecture and Storage

• Terabytes per year of raw zmod data– Access is required by many DAME services

• 1Tb per year of tracked orders that need to be searched against– Access required by Signal Data Explorer

• Observed in a distributed manner– Delivery to a central repository makes high bandwidth

requirements

Page 19: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Objectives

• Distributed search– Transparent

• Distribution of search and collation of results

– Efficient• Use of processing and communications resources

– Extensible• Permit addition / removal of resource

– Concurrent• Support multiple simultaneous searches

Page 20: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Objectives

• Generic mechanisms – Suitable for different types of time series data and a variety

of search methods

• Robust architecture– Graceful degradation when some components unavailable– Provision of intermediate results before all searching

completed

Page 21: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Design

• Pattern match controller (PMC) service– Controls distribution and collation of the search– Generic service– Simple interface– Minimal communications overheads

• Pattern matching service– Performs the search– Can be implemented in a variety of ways– Conforms to a simple API

• Storage resource broker (SRB)– Used to store and retrieve data and metadata– Provides a single logical view onto all stored data

Page 22: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Physical Architecture

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Pattern Match Controller

Pattern Matching Service

SRB

Search Node

Data Explorer

Work Station

Grid

Computers on the Grid

Page 23: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAMEHeathrow

Signal Data Explorer(Client Application)

Master Node

SRB

Slave Node

Gatwick

Search(…)

Search(…)

NodeSearch(…)PMC

Pattern MatchEngine

SRB

Search(…)

PMC

Pattern MatchEngine

Search Process

Page 24: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAMEHeathrow

Signal Data Explorer(Client Application)

Master Node

SRB

Slave Node

Gatwick

Pattern MatchEngine

SRB

PMC

Pattern MatchEngine

Search Process

ReturnResults(…)

ReturnResults(…)

ReturnResults(…)GetResults(…)

PMC

Page 25: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Implementation

• PMC– Java GT3 Grid service– Hosted within a Tomcat 4.1.24 installation

• Pattern matching service– Communicates with PMC using proprietary encoding– Uses SRB client library to access data

• Storage resource broker (SRB)– SRB server running at all WRG sites– Single metadata catalogue (MCAT) hosted at York

Page 26: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

APIs

• Client developers need only use simple store(), search() and getResults() API calls.

• Pattern matching service developers need only implement a simple interface of search() and store(), and use the returnResults() API call.

Pattern Match Control (PMC)‘master’ node

Pattern Matching Service

Search()Store() ReturnResults()

NodeSearch()

Signal DataExplorer

Storage Service

Extractor

PMC

PMC

Other PMCnodes

ReturnResults()

SearchConstraints

Builder

Raw Data

SRB

Engine Data

Tracked Orders

Store()

Scope

Constraints

Search()

Stored Data

Raw Engine Data

GetResults()

Engine Data

Page 27: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Summary

• Transparent – PMC distributes search to multiple pattern matching

services.– Results collated and returned to Data Explorer

• Efficient – PMC has minimal overheads– SRB handles used to identify results – minimal

communications bandwidth required for search

Page 28: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Summary

• Extensible – PMC uses a distributed catalogue of other PMC locations

Permits simple addition/removal of search nodes

• Concurrent – PMC uses unique search ids based on ‘master’ PMC id– Results kept for a time to allow access from other

workstations

Page 29: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Summary

• Generic mechanisms – PMC interface independent of type of time series data searched or

algorithms used

– Generic SRB handles used to identify data to search and results

• Robust architecture – High availability as clients may use any PMC node as ‘master’ for

a search.

– Temporary results built up and may be accessed before entire search complete

– Partial results in event of unavailable nodes

– Automatic clean-up after timeout

Page 30: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Signal Data Explorer

• Tool to allow investigation of data outside of an automatic workflow by a domain expert

• Accesses local data stores or remote (distributed) data sets and searching services.

• For this demo, searches against data held on the White Rose Grid at York, Leeds and Sheffield

Page 31: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Next Steps

• Scaling trials on engine data– Realistic number of concurrent users supported– Investigate performance as the number of nodes and/or

volume of data is increased

• Compare overhead in search time / network requirements to a centralised architecture

• Federated MCATs– Create several SRB ‘zones’ each with a metadata

catalogue

Page 32: Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME

Complex Search Requests

• Search Requests– May contain several query patterns to be matched against.

The results for one pattern may constrain the search space for another query pattern

– If treated as several individual queries may not be processed efficiently

• A ‘result’ may consist of data stored at more than one node– Slave nodes may be required to issue ‘sub-searches’ to

other nodes in the system