40
1 Data Science Platforms for Applications with Societal Impacts http://dsi.usc.edu/ Cyrus Shahabi, Ph.D. Professor of Computer Science, Electrical Engineering & Spatial Sciences Chair, Department of Computer Science Director, Data Science Institute (DSI) Director, Informatics Program ViterbiSchool of Engineering University of Southern California Los Angeles, CA 900890781 [email protected]

Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

1

Data Science Platforms for Applications with Societal Impacts

http://dsi.usc.edu/

Cyrus Shahabi, Ph.D.

Professor of Computer Science, Electrical Engineering & Spatial Sciences

Chair, Department of Computer Science

Director, Data Science Institute (DSI)

Director, Informatics Program

Viterbi School of Engineering

University of Southern CaliforniaLos Angeles, CA 900890781

[email protected]

Page 2: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

2

OUTLINE

DSI OverviewDSI Overview

Transportation Data PlatformTransportation Data Platform

Social Media Data PlatformSocial Media Data Platform

Health Data Platform Health Data Platform

Smart City Data PlatformSmart City Data Platform

Closing RemarksClosing Remarks

Page 3: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

3

OUTLINE

DSI OverviewDSI Overview

Transportation Data PlatformTransportation Data Platform

Social Media Data PlatformSocial Media Data Platform

Health Data Platform Health Data Platform

Smart City Data PlatformSmart City Data Platform

Closing RemarksClosing Remarks

Page 4: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

4

Page 5: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

5

A Data Science Research Center

Real-World

Data

&

Applications

TransDecTransportation

MediaQSocial Media

ATOM-HPHealth

I3Smart City

Tech

Tra

nsf

er

Fundamental Research

AMIA’16, ECCV’16, SIGKDD’16, SIGSPATIAL’16, VLDB’16,

BigMM’17, ICDE’17, SDM’17

NGA

Page 6: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

6

OUTLINE

DSI OverviewDSI Overview

Transportation Data PlatformTransportation Data Platform

Social Media Data PlatformSocial Media Data Platform

Health Data Platform Health Data Platform

Smart City Data PlatformSmart City Data Platform

Closing RemarksClosing Remarks

Page 7: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

7

Input Traffic Data Data Processing StorageAnalysis

&Visualization

Highway (4500+ sensors)

Arterial (4700 + 9500 sensors)

Bus & Rail (2000+ buses)

Event (~400 per day)

Ramp meter

CMS

StreamInsight

46 MB/min11 TB/Year

26 MB/min

Transit Ridership Data 4years of ~1M rows

Inrix Probe Data

1 year of 400M rows

Truck (WIM) Data

3 years of 10M rows

E.g., Traffic Forecasting

(ICDM’13, KDD 16, SDM’17)

Sens

or 3

Event

Location

Sens

or 2

Sens

or 1

Sens

or 4

ADMS:

An Exclusive Contract w LA-Metro

Transport-ation

Transport-ation

Page 8: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

8

2011

ADMS RFP

(Awarded to USC)

2011-2015

ADMS Developed

(Research/Prototype by USC)

2015-2016

ADMS Extension

(Awarded to USC)

2016-2021

ADMS Production

(Awarded to Parsons/USC

Tech Transfer of ADMS)

ADMS Public ReleaseTransport-

ationTransport-

ation

Page 9: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

9

• Did Expo Line increase transit patronage?

• Did Expo Line impact traffic performance?

• Quasi-experimental design: Before/after

and with/without

Policy- ADMS• Collaboration between IMSC and Sol Price School of Public Policy

Transport-ation

Transport-ation

Page 10: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

10

Data Driven Journalism

http://www.nbclosangeles.com/news/local/USC-Freeway-LA-

Traffic-Study_Los-Angeles-416848663.html

Transport-ation

Transport-ation

Page 11: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

11

Startup: TallyGo

US Patent No. 9,286,793

Traffic prediction using real-

world transportation data

March 15, 2016

US Patent No. 8,660,789

Hierarchical & exact fastest

path computation in time-

dependent spatial networks

February 2014

US Patent No. 8,566,030

Efficient K-nearest neighbor

search in time-dependent

spatial networks

October 2013

• New business model (API)

• LAFD Deployment

• Target is Series-A funding in 2017

Transport-ation

Transport-ation

Page 12: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

12

Research: Traffic ForecastingSingle sensor

Time series analysis

ICDM’2012

Multi sensor

Latent Space -- SIGKDD’2016

Single sensor

Causality

ICDM’2013

Multi sensor Deep Learning SDM’2017

UGraph matrix: Gnxn Latent properties: Unxk and Bkxk

UTBB

Transport-ation

Transport-ation

Page 13: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

13

Open Problem

Traffic

Prediction

Traffic

Prediction

Routing

Engine

Application Interface

(Mobile, In-Car)

Client

Real-time Traffic Data Feed

• Current: Optimize for single vehicle

Transport-ation

Transport-ation

Page 14: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

14

Future: City Flow Optimization & Control Utilization System

Page 15: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

15

OUTLINE

DSI OverviewDSI Overview

Transportation Data PlatformTransportation Data Platform

Social Media Data PlatformSocial Media Data Platform

Health Data Platform Health Data Platform

Smart City Data PlatformSmart City Data Platform

Closing RemarksClosing Remarks

Page 16: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

16

Mobile Progress

[1] http://mobithinking.com/mobile-marketing-tools/latest-mobile-stats

Social MediaSocial Media

Ubiquity of mobile users

6.5 billion mobile

subscriptions, 93.5%

of the world

population [1]

Technology advances on

mobiles

Smartphone's

sensors. e.g.,

video

cameras

Network bandwidth

improvements

From 2.5G (up to

384Kbps) to 3G (up

to 14.7Mbps) and

recently 4G (up to

100 Mbps)

Page 17: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

17

User-Generated Videos (UGVs)

-500

500

1500

2500

2009 2010 2011 2012 2013 2014

Pe

tab

yte

s/M

on

th

Source: Cisco

Mobile Video Traffic Growth

2009-2014

0

20

40

60

80

100

5/09 3/10 11/10 5/11 5/12 5/13

Ho

urs

of

Vid

eo

Up

loa

de

d

Pe

r M

inu

te

Source: YouTube

YouTube Hours of Video Uploaded per

Minute, 6/07 – 5/13

Large-scale High update rate

Social MediaSocial Media

Page 18: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

18

UGV and its Spatiotemporal Metadata

Record video using

Camera with sensors

(Mobile Apps)GPS, Compass, Clock

Metadata from Sensors

Model Geographical

Coverage of Video Scenes

A. S. Ay, R. Zimmermann, and S. H. Kim. Viewable Scene Modeling for

Geospatial Video Search. In ACM Intl. Conf. on MM, pages 309–318, 2008.

Social MediaSocial Media

Page 19: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

19

FOV Queries

Problem of

UGV search

Problem of

FOV search

Range query in MediaQ[Kim et al. MMSys14]

Spatial queries on UGVs

• Range queries– E.g., search videos overlapping with an area at

USC.

• Directional queries– E.g., search videos directed towards the North.

Social MediaSocial Media

Page 20: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

20

MediaQ Demohttp://mediaq.usc.edu/

Social MediaSocial Media

Page 21: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

21

Application with Societal Impact

Disaster Response

Social MediaSocial Media

Page 22: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

22

GeoQ – NGA’s Disaster Response Platform

Page 23: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

23

TechTransfer: MediaQ � NGA’s GeoQSocial MediaSocial Media

Page 24: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

24

Analyst’s

Work Cell

NGA GeoQ – USC MediaQ Integration:

Use GeoQ user’s map viewport to query videos from MediaQ

MediaQ videos on

GeoQ interface

MediaQ video is

being played

MediaQ

Layer

IMSC provides

APIs for integration

Social MediaSocial Media

Page 25: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

25

Data Collection and Analysis in Disaster

Media Query & Search

Data IntegrationMedia Collection

Social MediaSocial Media

Page 26: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

26

OUTLINE

DSI OverviewDSI Overview

Transportation Data PlatformTransportation Data Platform

Social Media Data PlatformSocial Media Data Platform

Health Data Platform Health Data Platform

Smart City Data PlatformSmart City Data Platform

Closing RemarksClosing Remarks

Page 27: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

27

Analytical Technologies to Objectively Measure Human Performance

(ATOM-HP)

Joan E. BroderickSenior Behavioral

Scientist; Associate Director, Center

for Self-Report Science Center for

Economic & Social Research

Paul Newton

Mathematics and Modeling

Cyrus Shahabi

Professor of Computer

Science & Electrical

Engineering

Sanjay Purushotham

IMSCLuciano Nocera

IMSC

Goal Evaluation of Human Performance in Cancer Patients

HealthHealth

Page 28: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

28

Performance Status Scale

-+ Performance Status remains best predictor of patient survival in patients with metastatic cancer:

better than genomics, blood based biomarkers, imaging

Evaluation limited to observations during visits

Page 29: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

29

Sensors Data Task

Clinical Band Calories, Step Count, Heart Rate

(mean, peak, min)

In the field: 60 days 8AM-8PM

Kinect Raw files ->

skeleton data

Clinic:

1. Chair to Table

2. Get-Up and Go

Military Band Calories, Step Count, Heart Rate

(mean, peak, min)

In the field: 5 days - all day

Kinect Raw files -> skeleton, face mesh, face

parameters (e.g., eye open, engaged)

Controlled environment

Walk and Talk

ATOM-HP: Body SensingHealthHealth

Page 30: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

30

Research: Integrated Micro & Macro Data Analysis

Skeleton Data GFT features capture

Co

ntro

lled

En

viro

nm

en

tC

on

trolle

d E

nviro

nm

en

t

_

+

HealthHealth

Page 31: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

31

ATOM-HP Demo: SXSL 2016 at White HouseHealthHealth

Page 32: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

32

ATOM-HP in the News

http://http://www.nbclosangeles.com/on-air/as-seen-

on/Wearable-Tech-Improves-Cancer-Treatment_Los-Angeles-

395200891.html

The ATOM-HP is a

formal

recommendation by

the Whitehouse. It

was presented to

President Obama by

Vice President Biden

in Sep’16 as the final

outcome of the

moonshot.

HealthHealth

Page 33: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

33

OUTLINE

DSI OverviewDSI Overview

Transportation Data PlatformTransportation Data Platform

Social Media Data PlatformSocial Media Data Platform

Health Data Platform Health Data Platform

Smart City Data PlatformSmart City Data Platform

Closing RemarksClosing Remarks

Page 34: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

34

DSI Private Cloud

Moving all our datasets into a single platform for Data and Code Sharing!

Smart City

Smart City

Page 35: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

35

Data Management and Analytics

Sensors Data Routing

(Heterogeneous Sensors)

(no gateway)

Real-time

Analytics

HDFS, Cassandra, ElastiCache

Apache Spark

Spark Streaming SparkSQL SparkR/MLib

Operational

Analytics

Batch

Processing

Predictive

AnalyticsOLAP

Data/Knowledge

Engineering

Real time

Analytics

G

a

t

e

w

a

y

Publish/

Subscribe

Queue 1 Event 1

Queue 2 Event 2

Queue 3 Event 3

(RabbitMQ)

DSI Uniform Data Platform for IoTSmart City

Smart City

Page 36: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

36

OUTLINE

DSI OverviewDSI Overview

Transportation Data PlatformTransportation Data Platform

Social Media Data PlatformSocial Media Data Platform

Health Data Platform Health Data Platform

Smart City Data PlatformSmart City Data Platform

Closing RemarksClosing Remarks

Page 37: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

37

Team

Page 38: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

38

Total (7 Years) Federal

State/city

Industry

Internal

International

foundation

Page 39: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

39

Impact – Workforce

Afsin Akdogan

Huy Pham

Bei Pan

Houtan Shirani-Mehr

Ali Khodaei

Leyla Kazemi

Ugur Demiryurek

Ling Hu

Songhua Xing

Ali Khoshgozaran

Graduates in the last 5 years

• PhD

Shireesh Asthana

Jiayun Ge

Yu Sun

Ashley Luo

Jingyi Du

Nicholas Bopp

Junyuan Shi

Colin Gu

Vanessa Kuroda

Ning Jiang

• Selected MS and Undergrad

Page 40: Data Science Platforms for Applications with Societal Impacts · Data Management and Analytics Sensors Data Routing (Heterogeneous Sensors) (no gateway) Real-time Analytics HDFS,

40

DSI Value Add to its Partners

• Our vision, expertise, background & experience in

– Fundamental and applied research

– Multidisciplinary research

– Integrated system development

• Our test-beds

• Government/Federal customers

• Industry Partners

• Global Reach

• Educational Presence