33
Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th , 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan Liu Spring 2004 http://www.public.asu.edu/~mgalan/ StreamProjApr15.ppt

Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Synthesis of Streaming Data from Multiple Sensors via Embedded Data

Extraction

April 15th, 2004 Project Report

Magdiel Galán

CSE591: DataMiningDr. Huan LiuSpring 2004

http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt

Page 2: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Outline Problem/Project Description Sampling Smoothing Clustering Current Status Plans

Page 3: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Project Description Synthesis of Streaming Data from

Multiple Sensors (~100’s) via Embedded Data Extraction for mission critical applications.

Work in conjunction with Motorola’s Human Interface Lab (on-going project) Simulation Environment

Page 4: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Project Description

Goal: Develop driver assistance system that provide feedback, but not control, during unsafe instances.

From distractions caused by cellphones, PDAs, eMail, Why: Targeting a government initiative to create a

safer car environment in the information age explosion

How: Develop intelligent system by mining Streaming Data from multiple automotive sensors

Development work being done using driving simulator with projections screens with up to 400 parameters/sensors including video links for eye-gaze and foot-pedal movement

Page 5: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Sample Cases Case Scenario #1:

Passing Slow Traffic which slowed down due to an accident

which you are also rubber-necking while fidgetting with your radio

Case Scenario #2: Making a left turn

while hearing directions from MapTracker while checking at the time because you are late

while reaching for the cellphone with on-coming call

Page 6: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Simulation Environment

150 Simulated View

Page 7: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Driving Experience

GasGas

EngineTempBatt

Oil

PDA

GearShift

CD

CellPhone

A/C

Air Bag

Acceleration

Lateral Acc.

Sonar Proximity Sensor

Wheel Rotation Brake Pressure

RPMs

GPS Internet

Driver

Page 8: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Motivation Primary Interest: Robotics

Merging of Sensors/Sensor Fusion optical proximity (IR, sonar, radar) location (GPS, visual maps) movement (actuators, rotations) system (battery, temperature, bump switches)

Problem: decide agent’s next best action vs. a goal

Not too dissimilar from an Automobile environment Other Applications:

Manufacturing Environment Increase Yields/Productivity/Reduce Defects using quality

control daily monitor data (100’s Parameters 1K’s) Pentium Ex.: Oxide Thickness, Poly Width, Boron

Implant Density, Plasma Etch eV’s, Litho PM, Diffuser RPMs, etc…

Page 9: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Stream Data Properties Numerical/Continuous

Speed Steering/Heading Acceleration (Forward/Lateral) Distance (Lane Edge, Vehicle on Front)

Categorical Lane Position Gear: P/R/D/OD/L1/L2 Headlights On/Off Radio/CD ON Incoming Call

Sampling Rate: 60Hz

Page 10: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Critical/Special Conditions

Left/Right Turn Passing/Changing Lanes U-Turn Reverse Tailgating Not On Road

Page 11: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Some Warning Signs Lane Drifting Erratic Behavior

droopy eyes eyes not facing the road foot/pedal movement do not correspond

with road conditions Incoming Call while performing

Critical Maneuver

Page 12: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Goal

Identify Instances outside normal patterns as an indication of an Abnormal Situation Hence – Need to draw Driver’s Attention

to Impending Situation Ultimate Goal:

Develop bootsrapping mechanism that combines driving situation classifiers (i.e. LeftTurn/Passing) together with instance selection methods in active learning

Bootsrapping – selecting high utility data for re-training

Page 13: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Instance Selection Properties Instance representative Instance selection reduce rows Ideal outcome instance selection

choose a data subset achieves same result as whole data with little or no performance PP deterioration

Should be model independent ∆ ∆ P(MP(Mii) ≐ ∆P(M) ≐ ∆P(Mjj))

[LM01]

Page 14: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Problem#1: Sampling

Initial step towards instance selection: select representative subset… Divide into collection of elements which

must cover the whole population without overlapping [GHL01]

These are called sampling units

Page 15: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Sampling Results

Sampling at 10mS (x-axis: signal duration; y-axis: count)

Page 16: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Problem#2: Smoothing Reduce/Filter out noise and outliers. Smoothing Techniques used:

Bin Median/Rolling Average [LM01]/[D03] Median preferred over Mean since less

sensitive to outliers Tresholding/Bin Boundaries

[LM01]/[HK01] 10% offset treshold

Page 17: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

PreSmoothing - RAW Data

x-axis: driving time elapsed in minutes

y-axis: speed(km/h); steering(degrees), heading(degrees)

Page 18: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

RAW Data Map/Course

Route Map – starting point at (0,0)

Page 19: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Smoothing Results - Median

x-axis: driving time elapsed in minutes

y-axis: speed(km/h); steering(degrees), heading(degrees)

Page 20: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Smoothing Results - Median

Page 21: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Smoothing Results - Threshold

Page 22: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Smoothing Results - Threshold

Page 23: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Dr. Liu’s Incremental Instance Selection AlgorithmGiven: Data streams with instances IOutput: indicative instances

For each data streamDo the following incrementally Create a profile P for I Check new instance i against P if i is an outlier of P

Return i else

Update P with iEnd do

Page 24: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Outliers

Page 25: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Problem#3: Clustering Why?

Data is Unclassified Previous results using Numerical Data on

most significant key parameters Develop clusters exemplifying ALL

attributes Select instances that do not belong to a

cluster as triggering mechanism

Page 26: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Stream Clustering Challenges Large “Unclassified” Data Base Fast On-Line Resolution within small

window 0.5 – to 2 or 3 seconds

One Pass Only restriction (need fast I/O) Mix of Numerical and Categorical Data

Traditional algorithms do not work well for categorical attributes (remember P/R/D/OD/L1/L2, or CD On)

Centroid approach cannot be used Hard to reflect the properties of the neighborhood of

the points

Memory Constraints

Page 27: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Clustering Techniques vs. Streaming Data SVM

Good at handling multidimensional data Not good – need classified data, lots of

I/O, data in memory BIRCH

Good at handling mulidimensional data, large databases; single scan, linear I/O time

Not good – predominantly for “numerical” type of attributes; order dependent

Page 28: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Clustering Techniques vs. Streaming Data (2)

CURE (Clustering Using REpresentative)[D03] Good at handling outliers; hierarchical Not good – random sampling (won’t fit

streaming) ROCK (RObust Clustering Using LinKs)

[D03] Good at Hierarchical clustering for

categorical attributes Not good: Random sampling for scale up

Page 29: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

My 1st Clustering Attempt…

Move in Reverse

Page 30: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

My 1st Clustering Attempt(2)

Zoom Next Page

Page 31: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

My 1st Clustering Attempt(3)

Move in Reverse

Page 32: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

Current Status/Plans This is an ON-GOING project Cluster Technique Development

Evolve from known methods? Generalization of the technique

Not just Automobile Streaming Data

Page 33: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan

References [LM01] H.Liu, H. Motoda. “Data Reduction via Instance Selection”.

Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library

[GHL01] B. Gu, F.Hu, H. Liu. “Sampling: Knowing Whole From its Part”. Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library

[HK01] J. Han, M. Kamber. Data Mining Concepts and Techniques. Chps. 3, 8 Data Cleaning, Clustering. Morgan Kaufman. ASU Library

[D03] M.Dunham. Introductory and Advanced Topics. Prentice Hall, Chps. 3-5. Mining Techniques, Classification, Clustering. ASU Library