The Old Way
2
? !
Big Data
3
! ? ! ? ! ? ! ? ! ?
? ! ? ! ? ! ? ! ? ! ? ! ? !
? ! ? ! ? ! ? ! ? ! ? ! ? !
4
Streams Programming Model
• Streams applica4ons are data flow graphs that consist of: – Tuples: structured data item – Operators: reusable stream analy4cs – Streams: series of tuples with a fixed type – Processing Elements: operator groups in execu4on
Streams Programming Language composite Main { !type! Entry = int32 uid, rstring server, ! rstring msg; ! Sum = uint32 uid, int32 total; !graph ! stream<Entry> Msgs = ParSource() { ! param servers: "logs.*.com"; ! partitionBy: server; ! } !! stream<Sum> Sums = Aggregate(Msgs) { ! window Msgs: tumbling, time(5), ! partitioned; ! param partitionBy: uid; ! } !! stream<Sum> Suspects = Filter(Sums) { ! param filter: total > 100; ! } !! () as Sink = FileSink(Suspects) { ! param file: "suspects.csv"; ! }!} !
5
ParSrc
Aggr
Filter
Sink
ParSrc
Aggr
Filter
ParSrc
Aggr
Filter
Sink
ParSrc
Aggr
Filter
6
SPL source
x86 host x86 host x86 host
x86 host
x86 host
PE
PE
PE
PE
PE
PE
PE
PE Co
nnec4o
ns
Source
Sink
PE
SPL compiler
Streams Run4me
Source è Compila4on è Execu4on
6
7
SPL source
x86 host x86 host x86 host x86 host x86 host
PE
PE
PE
PE
PE
PE
PE
PE
Conn
ec4o
ns
Source
Sink
PE
SPL compiler
Streams Run4me
Source è Compila4on è Execu4on
7
8
x86 host x86 host x86 host x86 host x86 host
PE PE
Sink
Source
Source PE
PE
PE PE
Sink
Sink
PE
PE
PE
PE
PE
PE
PE
PE Co
nnec4o
ns
Source
Sink
PE
SPL compiler
Streams Run4me
(Job management, Security,
Con4nuous Resource Management)
Source è Compila4on è Execu4on
8
Applica4ons
• Smart Grid – Connec4ng generators, distributors and transmi6ers
• Social Media Analy4cs – Sen4ment analysis of movie marke4ng – Disease tracking
• Medical – SickKids: neonatal monitoring – ANS modeling
9
Smart Grid
Pacific Northwest Smart Grid
• Goals – Manage peak demand
– Integrate renewable energy sources
– Address constrained resources
– Reliablity, effeciency – Dynamically choose most econmical resource
Power and Demand Flow generators transmi7ers distributors consumers
Transac4ve Control Flow generators transmi7ers distributors consumers
EIOC
incen%ve
feedback
Transac4ve Architecture
Field Communica4ons Network (Public Wireless Access, WiMax, BPL)
Line mounted sensors Substa4on Smart
metering PHEVs Distributed
Genera4on
Transmission & Distribu4on and Genera4on Assets Remote IT Assets
Servers Applica4ons
Netezza Data Warehouse
Dashboard
InfoSphere Streams
3Tier Renewable Forecasts
Off-‐Line, Deep Analy4cs (Data-‐mining / model building)
Message Broker Micro broker & MQ6
iCS (ISO/IEC 18012-‐2 applica4on interoperability framework)
iCS (ISO/IEC 18012-‐2 applica4on interoperability framework)
Alstom T&D EMS & Market SW
Data media4on and real-‐4me analysis
Energy Informa%on Opera%ons Center (EIOC)
U%lity Subproject Sites
High Speed Data Media4on
ac4veMQ JMS Client Source Adapter
Data Transform
Ac#veMQ JMS Provider
Netezza ODBC
Provider
ODBCAppend table instance
table filter data prep
ODBCAppend table instance
table filter data prep
ODBCAppend table instance
table filter data prep
message
table access specific
one job per Netezza table
alerts Alerts File Daily
Message FIle
alert
Social Media
Super Bowl Movie Trailers
• Effec4veness – How many people are talking about the movies? – Who are they? – Do they intend to see it? – How does this compare to other movies?
• Big Data – Over 1 billion messages – Over 30 million profiles extracted
Buzz During Super Bowl
21 Jump Street
Act of Valor
Battleship
Dr. Seuss\' The Lorax
G.I. Joe: Retaliation
Ghost Rider: Spirit of Vengeance
John Carter
The Dictator
The Avengers
0
5000
10000
15000
20000
25000
30000
5pm 6pm 7pm 8pm 9pm 10pm 11pm 12am
Avid Movie Goer Buzz During Super Bowl
Project X
John Carter
Battleship
Ghost Rider
21 Jump Street
The Dark Knight Rises
G.I. Joe
Spider-man
The Avengers
Act of Valor
The Lorax The Dictator
10% 20% 30% 40% 50% 60% 70% 80% 90%
Battleship
The Dictator
CA TX NY VA FL GA NC OH MD NJ
Gender
Top 10 Markets
Gender
Top 10 Markets
Female Male
Female
CA TX NY GA FL NC MD VA OH PA
Male
CA TX NY OH FL NC VA GA MD PA
Gender
Top 10 Markets
Female Male Act of Valor
Buzz by Demographics
Disease Tracking
Social Media Analy4cs Architecture
Social Media Consumer Profiles
Customer Models
InfoSphere Streams
InfoSphere BigInsights
Entity Integration
Predictive Analytics
Data Ingest & prep.
Text Analytics: Timely Insights
Entity Integration:
Profile Resolution
Predictive Analytics:
Action Determination
Social Media Data
Online Flow: Data-‐in-‐mo%on analysis
Text Analytics
Offline Flow: Data-‐at-‐rest analysis
Timely Decisions
Social Media Data
Customer Database
Consumer Lists
Customer & Prospect
profiles
Entity Integration
Medical
24
"Data Baby"
Real Neonatal ICU
Current ICU Monitoring
Pa@ent
Device
Device
Device
Pa4ent Record
Fixed, threshold based alerts (e.g.: heart rate below 45 bpm)
What's Missing
• Automa4on • Data reten4on • Complex temporal correla4ons
– e.g., Alert when both SpO2 and blood pressure drop below a threshold for 45 seconds
– e.g., Alert when variability of the HR is below a threshold for a 30 minute window
– e.g., Alert when probability of observing bad event in next 6 hours is high
• Further explora4on of physiological data
SickKids Deployment
Simple Example
• "Baby crashing" – Correlate SpO2 with mean arterial pressure
• SpO2 < 85 • Mean BP < gesta4onal age
– For 20 seconds
Not So Simple Example • Detec4ng onset of sepsis in premature infants • Data sources
– Systolic, diastolic and mean arterial blood pressure – SpO2 – Respira4on rates – Electrocardiograms – Sta4c clinical data
• Fusion and scoring – Based models specified by physicians – Data driven approach uses machine learning to discover new models
• Deployed at SickKids hospital in Ontaria, Canada monitoring NICU pa4ents
Modeling the ANS with Heart Rate Variability
QRS Detec4on
RR Genera4on
Ectopic/Ab-‐ normal Beat
Detec4on
Interpola4on
Windowing
Windowing
Windowing
Windowing
Windowing
HRV1
HRV2
HRV3
HRV5
HRVk
Synch
Source Receive Decode Raw ECG
Transform QRS Mask
Clean and Interpolate FMP Anomaly detection Cubic spline interpolation
Select Aggregation
Transform HRV Mask
Synchronize
Standard
Our research LEGEND