16
Sirenic Stream Infrastructure for Real-Time ICU Data Analysis Yanif Ahmad, Computer Science Email: [email protected] Collaborators: Raimond Winslow (BME) Yair Amir (CS), Suchi Saria (CS, MPH)

Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

SirenicStream Infrastructure for

Real-Time ICU Data Analysis

Yanif Ahmad, Computer ScienceEmail: [email protected]: Raimond Winslow (BME)Yair Amir (CS), Suchi Saria (CS, MPH)

Page 2: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Long-Lived Data Infrastructure

“Pipeline”: workflow-centric data and task management

• Multi-scale problems• Multi-faceted workloads

Data platform desiderata

• Transparency, flexibility• Scalability and usability

Diverse userbase

• Core team• Long tail of users

Page 3: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Workload-Driven Database ArchitecturesSpecialization dimension Applications Gains Enabling mechanism

Transactions Web services,e-commerce

80x speedup(since 2008)

Coordination restriction

Batch analytics Log analysis, BI 10x cheaper(since 2004)

Simpler, open infrastructure

Continuous and incremental

Monitoring, finance 100-1,000x speedup(since 2003)

Remove design impedance

Graphs, NoSQL Documents, web graphs, brains

10x cheaper10-100x speedup

Better hardware utilization

Approximate and probabilistic

Data cleaning and exploration

New subfield(~2004)

New algorithms and theory

Workflows and computational

Data-driven science,linear algebra, ML

10-100x speedup(since 2010)

Better hardware utilization and interoperability

“Wide” data Statistics, data exploration, ML

TBA (since 2013) Better hardware utilization

Qu

ery

Dat

ase

tE

me

rgin

g

Page 4: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Analysis algorithm

New Tools for Data Systems Design

Data wrangling

Desktop “silo”

Key challenges to specialization:• Implementation bottlenecks• No short feedback cycle

K3: declarative systems design• Technical aim: mechanize domain-

specific data tools construction• Usability aim: remove barriers in

turning analytics into services

Page 5: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Analysis algorithm

New Tools for Data Systems Design

Data wrangling

Request routing

Exposition

Desktop “silo”

Service “wrapper”

Key challenges to specialization:• Implementation bottlenecks• No short feedback cycle

K3: declarative systems design• Technical aim: mechanize domain-

specific data tools construction• Usability aim: remove barriers in

turning analytics into services

Page 6: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Analysis algorithm

New Tools for Data Systems Design

Coordination

Resource allocation

Fault tolerance

Data movement

Data wrangling

Request routing

Exposition

Desktop “silo”

Service “wrapper”

Key challenges to specialization:• Implementation bottlenecks• No short feedback cycle

K3: declarative systems design• Technical aim: mechanize domain-

specific data tools construction• Usability aim: remove barriers in

turning analytics into services

Page 7: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

New Tools for Data Systems Design

C++14 LLVM

K3

Virtualization & provisioning

Storage

JIT compilation

Key challenges to specialization:• Implementation bottlenecks• No short feedback cycle

K3: declarative systems design• Technical aim: mechanize domain-

specific data tools construction• Usability aim: remove barriers in

turning analytics into services

Page 8: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

New Tools for Data Systems Design

Initial results from July 2014:20 node, 160-core deployment, ~250GB web log dataset, and ~20GB molecular dynamics dataset

Key challenges to specialization:• Implementation bottlenecks• No short feedback cycle

K3: declarative systems design• Technical aim: mechanize domain-

specific data tools construction• Usability aim: remove barriers in

turning analytics into services

Page 9: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Building Data Pipelines in K3Goal: enhance usability through pipeline data products

Approach: data views, for data independence in pipeline data products

• A central logical abstraction mechanism in DBMS

Views naturally model:• Multi-resolution, and multi-modal data• Archiving (snapshots/checkpoints) and visualization

Query answering

Analysis and exploration

Data integration

View

?Tasks

Views

Page 10: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Physiological Time SeriesDataset: bedside monitors capturing patient state

• 16 channels per bed (e.g., ECG, NIBP, SpO2), 2kHz sampling rate per channel

Data challenge: alarms detection and patient state prediction

• Particularly for rapid onset diseases (e.g., sepsis, arrhythmias)

Project aims: design and deploy initial platform to facilitate real-timeanalysis algorithm design

GE Solar 8000i patient monitor

Page 11: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Sirenic Architecture and Data TasksLive data source since mid-August:JHH PICU (Dr. Mela Bembea, Director)3 beds, 1610 anonymized series

Data and model-driven analyses: • Signature (motif) identification

and extraction• Latent structure model

development

Workflow tasks:• HL7 parsing & deidentification• Model-driven simulation• Drill-down visualization

Page 12: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

S. Saria, A. Rajani, J. Gould, D. Koller, A. Penn. Integration of Early Physiological Responses Predicts Later Illness Severity in Preterm Infants. Science Translational Medicine, September 2010. Vol. 2, Issue 48

S. Saria, D. Koller, A. Penn. Discovering shared and individual latent structure in multiple time series NIPS Predictive Models in Personalized Medicine, August 2010.

Modelling Health State Over Time

Page 13: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Real-time Views in Sirenic

Novel change propagation techniques based on multiple “delta” data representations

• Think “finite differences” for database queries

• Data always “in-flight”, facilitating delayed corrections (missing data, improved robustness)

• Clean snapshot semantics, and efficient snapshot management

SQL query, Q

Q1 Q2 Q3

Q21 Q22 Q23

Page 14: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Real-time Views in SirenicUse cases:

• Scalable inference via distributed, blocked Gibbs sampling

• Multi-resolution aggregate summaries and difference data structures for data exploration

Project status: initial data exploration, and infrastructure deployment

Novel change propagation techniques based on multiple “delta” data representations

• Think “finite differences” for database queries

• Data always “in-flight”, facilitating delayed corrections (missing data, improved robustness)

• Clean snapshot semantics, and efficient snapshot management

Page 15: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Data Applications

Data Views

Sirenic

K3: ProgrammableData Systems

Summary

Data Abstraction Layer

Systems Abstraction Layer

Page 16: Sirenic Stream Infrastructure for the Real-Time Analysis ...idies.jhu.edu/wp-content/uploads/2015/09/IAS2014Ahmad.pdf · New Tools for Data Systems Design Initial results from July

Thank You!

[email protected]://damsl.cs.jhu.edu