Page 1: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Ernestina Menasalvas Ruiz Pedro Sousa


Page 2: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour


• Extract knowledge from aviation data sources to obtain patterns that help detection of incidents

Learn behaviour models

Page 3: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

What is Data Mining?

• Many Definitions– Non-trivial extraction of implicit, previously unknown and

potentially useful information from data– Exploration & analysis, by automatic or

semi-automatic means, of large quantities of data in order to discover meaningful patterns

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

KDD process

Page 4: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour





Data Preparation




Page 5: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour


• Data integration• Aircraft information• Context: sensors, space weather, location, weather• Operations: pre-flight, departure, climb, enroute, arrival,

taxing, post-flight• Aviation safety reports

• Dynamic and complex data:– theoretical and practical aspects of the algorithms have

to be analyzed to discover the most appropriate techniques:

• trend analysis, association of events, datastream methods, context integration, resource awareness

Page 6: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

GOAL (cont)

• apply algorithms to mine the various data sources for information– to identify patterns:

• atypical flights,• anomalous cockpit procedures • Groups of safety reports

• BUT:– KDD is a process

• Static vs dynamic

Page 7: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

KDD process

Aprox. 80% effort

Page 8: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Data Exploration and transformation

• Exploration of the data to better understand its characteristics.– Helping to select the right tool for preprocessing or analysis– Making use of humans’ abilities to recognize patterns– Integrate semantic of data

– Clustering and anomaly detection will be used as exploratory techniques

• Transform data prior to mining so to be able to extract the useful patterns

Page 9: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Data Mining Tasks

• Prediction (Supervised learning)– Use some historical information to learn a model that can

help to predict unknown or future values of some variable.– Base for forecasting

• Classification• Regression • Deviation Detection

• Description (Unsupervised)– Find patterns that describe the data– Clustering – Association Rule Discovery – Sequential Pattern Discovery

Page 10: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour


• Given a collection of records in which the class is known: – Find a model able to describe the class given values of the rest of

attributes.• Measurements have to be used to validate the model and

determine accuracy of prediction– Train and test

• Techniques– Induction tree

• C4.5 , ID3• Very effcients if we look at the execution time• Very intuitive results

– Neural networks• The result is a neural network: black box• Robust• No intuitive

Page 11: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour


• Given a set of records (unclassified), group records in such a way that:– records in one cluster are more similar to one another.– records in separate clusters are less similar to one another.

• Similarity Measures have to be defined:– Special attention to distance understanding

• Approaches– Divisive Algorithms: They first build different partitions and then

these partitions are evaluated:• K-means

– Hierarchical: They build a hierarchical descomposition – Density based: density functions are used – Kohonen networks [Kohonen ‘95]

Page 12: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Association Rule Discovery

• Given a set of records described by a set of attributes:– Find associations in values of attributes– Once associations are discovered, rules can be obtained– Confidence vs support .– Apriori Algoritm

At1=1 and At3=1 and At4=1At1 At2 At3 At4 At5 At6 At7

0 1 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 0 0

Page 13: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Challenges of the algorithms

• Algorithm to find anomalies in large dataset :– be fast – scalable. – Accurate

• Algorithms have to be able to deal with:– continuous sequences, representing sensor data

such as airspeed and altitude– discrete sequences, such as sequences of pilot

switch presses.

Page 14: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Data streams vs static data

Page 15: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Data streams

Challenges into algorithms:- Processing data in a single pass.- Generation models in an

incremental way.- Ability to detect model changes

over time.- Limit usage of memory and

computing time.- Possibility of automating the

evaluation process.

A data stream:

- is potentially unbound in size

- needs to be analyzed over


- arrives at very high rate

- and its undelying model

evolves over time

[Aggarwal et al.] “Data Streams: Models and Algorithms”. Advances in Database Systems, Springer, 2007[Aguilar-Ruiz, Gama] “Data Streams”. Journal of UniversalComputer Science , 2005[Barbará] “Requirements for clustering data streams”. SIGKDD’02.

Page 16: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour


• New challenges introduced by evolving data like:– resource aware learning, – change detection,– novelty detection– important application areas where data evolution

must be taken into account– how learning under constraints (time, storage

capacity and other resources) is affected by data evolution

– how context can help learning process

Page 17: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Change and concept drift

tim e


nsudden d rift

tim e


ngradua l d rift

tim e


nincrem enta l d rift

tim e


nreoccurring con texts

[Joao Gama 2010]

Concept drift: the underlying concept may shift unexpectedly from time to time.• Changes appear:

• Adversary actions• Varying personal interest• Changing population• Complex environment

Page 18: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Required features

• Examples have to be processed as they arrive• Each example should be processed:

– Small constant time– Fixed amount of main memory– Single scan of the data– Without (or reduced) revisit old records.

• Produce models equivalent to the one that would be obtained by a batch data-mining algorithm

• Detect and react to concept drift

[Joao Gama 2010]

Page 19: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Recurrent concepts

• Many learning algorithms to deal with concept drift – Based on: time windows, ensembles, drift

detection.– FLORA, SEA, DWM, DMM, ...

• What about Recurrent concepts? – Particular type of concept drift.– Fogetting mechanisms, past data and models are

discarded.– However, its common for concepts to reappear.

Page 20: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Context and data stream

Page 21: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Context• Context representation:• Context similarity:



Page 22: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Context integration• We want to integrate context information with

previously learned models.

• freqC is the most frequent Context in a sequence of context states {C1, C2, ... Cn}

• Concept history with associated context. h(Mk|Ci)

• Estimate that Mk represents the current underlying concept given the current context.

Page 23: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Model Storage

• Model storage for a model Mk:• the period k where the model was used.• using NB requires storing the CV• the frequent context freqC for period k.• accuracy of the model when it was in use.

• Represented as the tuple:

Page 24: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Model Retrieval• Model retrieval for a model Mk:

– using a sample Sn of recent records,

– compute the MSE for Mk

– get the freqC for Sn

– use history h(Mk|freqC)

• The utility is defined based on model accuracy (highest) and with context similar (min distance) to the current one.

• Retrieve the model with highest utility as:

Page 25: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

CALDS: learning process

• Incrementally Learn the underlying concept• When warning is signaled:

• Prepare a new base learner for the possible new concept• Anticipate to drift

• When drift is detected:• Store the current model• Reuse a previously learned model when the underlying

concept is recurrent.

Page 26: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

CALDS: learning process

Page 27: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Improvements integrating context

Overall accuracy: 72.5 %; 69,6%; 62,2%

Page 28: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour


Page 29: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Other current applications• ESA- European Space Agency

– Event Reporting Tool for non-manned satellite passes (Cryosat monitoring)


Page 30: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

current applications• ESA- European Space Agency / Galileo Industries

– Galileo - Ground Control Segment Central Monitoring & Control Facility


Page 31: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Some current applications• Portuguese Navy

– Singrar – Integrated System for Ship Repair and Resource allocation


Page 32: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

The process


Integrated Risk

Plans Activation / Maintenance

Drillings Training


Page 33: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Space Weather

Page 34: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Why – Space Weather?

• To protect systems and people that might be at risk from space weather effects, we need to understand the causes of space weather.

Pedro Sousa
Page 35: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Space Weather Decision Support System

• SWDSS Third project financed by the European Space Agency (ESA) about SW

• SWDSS main objective is to develop software capable of storing, manipulating and reacting to adverse Space Weather situations in spacecrafts:

. Providing tools for analyzing the collected data;

. Supplying reporting facilities for systems management;

. Supplying a knowledge discovery tool for nowcast, forecast and data mining.

Page 36: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Data sources and providers

• Mission’s telemetry (payload and/or housekeeping) data and processed data

• Mission’s auxiliary data, e.g. orbital coordinates, apogee and perigee crossings, station coverage and hand-over, events, 3D models, metadata

• Data available from other sources, e.g.NOAA, SIDC, SWENET, National Agencies

• Data from ground-based measurements

Page 37: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Satellite Monitoring

Page 38: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour


• Huge amount of aviation data1. Integrate data (micro and macro level)2. Enrich data with semantics3. Map data with technique to discover patterns (static and

streams) :1. Anomalities2. predictive3. Sequences4. Context influence

• Data mining in other similar domains has obtained results

• Next step: data mining for aviation safety

Page 39: Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour

Ernestina Menasalvas RuizPedro Sousa