70
1 © Copyright 2015 Pivotal. All rights reserved. 1 © Copyright 2013 Pivotal. All rights reserved. Internet of Things How Data Science-Driven Software is Eating the Connected World Sarah Aerni, Principal Data Scientist Pivotal @itweetsarah Hadoop Summit, San Jose, CA June 10th

Internet Of Things: How Data Science Driven Software is Eating the Connected World

Embed Size (px)

Citation preview

1 © Copyright 2015 Pivotal. All rights reserved. 1 © Copyright 2013 Pivotal. All rights reserved.

Internet of Things How Data Science-Driven Software is Eating the Connected World Sarah Aerni, Principal Data Scientist Pivotal @itweetsarah Hadoop Summit, San Jose, CA June 10th

2 © Copyright 2015 Pivotal. All rights reserved.

Our everyday devices are smart and talk to us

3 © Copyright 2015 Pivotal. All rights reserved.

These devices are now talking to each other

4 © Copyright 2015 Pivotal. All rights reserved.

Connected devices take action to make daily life easier.

But what else?

5 © Copyright 2015 Pivotal. All rights reserved.

How can IoT help prevent accidents like the Macondo

Disaster ?

6 © Copyright 2015 Pivotal. All rights reserved.

Gene Sequencing

Smart Grids

COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 TO $1K IN 2014

READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE

Stock Market

Social Media

FACEBOOK UPLOADS 250 MILLION

PHOTOS EACH DAY

In all industries billions of data points represent opportunities for the Internet of Things

Oil Exploration

Video Surveillance

OIL RIGS GENERATE

25000 DATA POINTS PER SECOND

Medical Imaging

Mobile Sensors

7 © Copyright 2015 Pivotal. All rights reserved.

Smart Systems = Sensors + Digital Brain + Actuators

Problem Formulation

Modeling Step

Data Step Application Step

Data Science for Building Models

Sensors & Actuators

Data Lake

8 © Copyright 2015 Pivotal. All rights reserved.

How can data drive true, automated action?

How does this…

9 © Copyright 2015 Pivotal. All rights reserved.

How can data drive true, automated action?

How does this… …become this?

10 © Copyright 2015 Pivotal. All rights reserved.

How can data drive true, automated action?

How does this… …become this?

11 © Copyright 2015 Pivotal. All rights reserved.

How can data drive true, automated action?

� How is data collected?

� Where is it stored and processed?

�  Is there real signal or just noise?

� How can we build a predictive model?

� When is the right time to take action?

12 © Copyright 2015 Pivotal. All rights reserved.

Critical considerations for successful modeling How to build a

predictive model at scale

Data-driven paradigms, data cleansing and feature engineering

Use Cases

Oil Drilling

13 © Copyright 2015 Pivotal. All rights reserved.

Critical considerations for successful modeling How to build a

predictive model at scale

Tradeoffs between model accuracy and timeliness

Data-driven paradigms, data cleansing and feature engineering

Use Cases

Oil Drilling Vaccine Manufacturing

14 © Copyright 2015 Pivotal. All rights reserved.

Critical considerations for successful modeling How to build a

predictive model at scale

Data-driven paradigms, data cleansing and feature engineering

Use Cases

Oil Drilling Vaccine Manufacturing

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Treating Patients

15 © Copyright 2015 Pivotal. All rights reserved.

Critical considerations for successful modeling

Data-driven paradigms, data cleansing and feature engineering

Use Cases

Oil Drilling Vaccine Manufacturing Treating Patients

How to build a predictive model at

scale

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

16 © Copyright 2015 Pivotal. All rights reserved.

Data: The New Oil Drilling into the San Andreas Fault at Parkfield

California. Credit: Stephen H.

Hickman, USGS

�  Oil & gas exploration and production activities generate large amounts of data from sensors

�  What opportunities exist for data-driven approaches to improve operations?

*http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry

17 © Copyright 2015 Pivotal. All rights reserved.

Data: The New Oil Drilling into the San Andreas Fault at Parkfield

California. Credit: Stephen H.

Hickman, USGS

�  Oil & gas exploration and production activities generate large amounts of data from sensors

�  What opportunities exist for data-driven approaches to improve operations?

Drilling operations Predictive maintenance

*http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry

18 © Copyright 2015 Pivotal. All rights reserved.

Data: The New Oil Drilling into the San Andreas Fault at Parkfield

California. Credit: Stephen H.

Hickman, USGS

�  Oil & gas exploration and production activities generate large amounts of data from sensors

�  What opportunities exist for data-driven approaches to improve operations?

Drilling operations •  Predicting drill rate-of-penetration (ROP) •  Motivation: Shorter time to oil production

lowers costs and increases production •  Goals

–  Identify optimal parameters for rapid drilling

–  Create an initial approach for drilling –  Potential reduction in damage to

equipment

*http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry

Predictive maintenance

19 © Copyright 2015 Pivotal. All rights reserved.

Data: The New Oil Drilling into the San Andreas Fault at Parkfield

California. Credit: Stephen H.

Hickman, USGS

�  Oil & gas exploration and production activities generate large amounts of data from sensors

�  What opportunities exist for data-driven approaches to improve operations?

Drilling operations •  Predicting drill rate-of-penetration (ROP) •  Motivation: Shorter time to oil production

lowers costs and increases production •  Goals

–  Identify optimal parameters for rapid drilling

–  Create an initial approach for drilling –  Potential reduction in damage to

equipment

Predictive maintenance •  Predict equipment function and failure •  Motivation: Failure costs estimated at

$150,000/incident (billions annually)* •  Goals

–  Early warning system –  Insights into prominent features impacting

operation and failure –  Reduction of non-productive drill time –  Reduced incidents

*http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry

20 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

Feature Building Modeling

21 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

Feature Building Modeling

Integrated Data

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit (WOB)

22 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

RO

P Time

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●●

●●●

●●●●●●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●●●

●●●

●●●

●●●●●●

●●●

●●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●●●●●●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●●●●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●

●●●

●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●●●

●●●

●●●

●●●●

●●

●●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

6080

100

120

140

df$ts_utc

df$rop

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit (WOB)

23 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

RO

P Time

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●●

●●●

●●●●●●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●●●

●●●

●●●

●●●●●●

●●●

●●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●●●●●●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●●●●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●

●●●

●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●●●

●●●

●●●

●●●●

●●

●●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

6080

100

120

140

df$ts_utc

df$rop

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit (WOB)

Drill bit changes

24 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

WO

B

Time

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●●

●●●

●●●●●●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●●●

●●●

●●●

●●●●●●

●●●

●●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●●●●●●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●●●●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●

●●●

●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●●●

●●●

●●●

●●●●

●●

●●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

6080

100

120

140

df$ts_utc

df$rop

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●●

●●●

●●●●●●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●●●

●●●

●●●

●●●●●●

●●●

●●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●●●●●●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●●●●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●

●●●

●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●●●

●●●

●●●

●●●●

●●

●●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit (WOB)

25 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

WO

B

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20df$ts_utc

df$w

ob

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit (WOB)

26 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

WO

B

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20df$ts_utc

df$w

ob

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●●●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●●●

●●

●●●

●●●●●●●●●

●●

●●

●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●●●●

●●●●●●●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit

27 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

WO

B

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20df$ts_utc

df$w

ob

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit

28 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

WO

B

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20df$ts_utc

df$w

ob

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit

A cleansing approach: use average across a window

29 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

WO

B

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20df$ts_utc

df$w

ob

Primary data sources

Operator Data ( thousands of records )

•  Failure details •  Component details •  Drill Bit details

Drill Rig Sensor Data ( billions of records )

•  Rate of Penetration (ROP) •  RPM •  Weight on Bit (WOB)

30 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

Feature Building Modeling

Bit

posi

tion

RPM

RO

P W

OB

•  A failure occurred at the end of this run

31 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

Feature Building Modeling

•  A failure occurred at the end of this run

•  Taking a window of time prior to failure, what features should we extract (e.g. variance of RPM, max bit position velocity)?

Bit

posi

tion

32 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Integrating & Cleansing

Feature Building Modeling

•  A failure occurred at the end of this run

•  Taking a window of time prior to failure, what features should we extract (e.g. variance of RPM, max bit position velocity)?

RPM

33 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Predict occurrence of equipment failure in a chosen future time window

Predict remaining life of equipment

Predict Rate-of-Penetration

Integrating & Cleansing

Feature Building Modeling

34 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Predict occurrence of equipment failure in a chosen future time window

•  Logistic Regression •  Elastic Net Regularized Regression (Binomial) •  Support Vector Machines

Predict remaining life of equipment

Predict Rate-of-Penetration

Integrating & Cleansing

Feature Building Modeling

35 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Predict occurrence of equipment failure in a chosen future time window

•  Logistic Regression •  Elastic Net Regularized Regression (Binomial) •  Support Vector Machines

Predict remaining life of equipment •  Cox Proportional Hazards Regression

Predict Rate-of-Penetration

Integrating & Cleansing

Feature Building Modeling

36 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data?

Predict occurrence of equipment failure in a chosen future time window

•  Logistic Regression •  Elastic Net Regularized Regression (Binomial) •  Support Vector Machines

Predict remaining life of equipment •  Cox Proportional Hazards Regression

Predict Rate-of-Penetration •  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Support Vector Machines

Integrating & Cleansing

Feature Building Modeling

37 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables –  ROP = c0+ WOB * cWOB

0 10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

38 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables

0

10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

39 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables

� How to compute with a single scan?

0 10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

40 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

41 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

Segment 1 Segment 2

42 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

Segment 1 Segment 2

43 © Copyright 2015 Pivotal. All rights reserved.

Linear regression on 10 million rows in seconds

0

50

100

150

200

0 50 100 150 200 250 300 350

6 Segments 12 Segments 18 Segments 24 Segments

Hellerstein, Joseph M., et al. "The MADlib analytics library: or MAD skills, the SQL." Proceedings of the VLDB Endowment 5.12 (2012): 1700-1711.

# independent variables

Exe

cutio

n tim

e (s

)

44 © Copyright 2015 Pivotal. All rights reserved.

BIG DATA MACHINE LEARNING IN SQL http://madlib.net/

Predictive Modeling Library

Linear Systems •  Sparse and Dense Solvers

Matrix Factorization •  Single Value Decomposition (SVD) •  Low-Rank

Generalized Linear Models •  Linear Regression •  Logistic Regression •  Multinomial Logistic Regression •  Cox Proportional Hazards •  Regression •  Elastic Net Regularization •  Sandwich Estimators (Huber white,

clustered, marginal effects)

Machine Learning Algorithms •  Principal Component Analysis (PCA) •  Association Rules (Affinity Analysis, Market

Basket) •  Topic Modeling (Parallel LDA) •  Decision Trees •  Ensemble Learners (Random Forests) •  Support Vector Machines •  Conditional Random Field (CRF) •  Clustering (K-means) •  Cross Validation

Descriptive Statistics

Sketch-based Estimators •  CountMin (Cormode-

Muthukrishnan) •  FM (Flajolet-Martin) •  MFV (Most Frequent

Values) Correlation Summary

Support Modules

Array Operations Sparse Vectors Random Sampling Probability Functions PMML Export

45 © Copyright 2015 Pivotal. All rights reserved.

Critical Considerations for Successful Modeling

Data-driven paradigms, data cleansing and feature engineering

Use Cases

Oil Drilling Vaccine Manufacturing Treating Patients

How to build a predictive model at

scale

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

46 © Copyright 2015 Pivotal. All rights reserved.

Opportunities for Data-Driven Decisions in Pharma

47 © Copyright 2015 Pivotal. All rights reserved.

Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

48 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

49 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Automated raw materials mixing

Input materials Mix Incubate Filter Centrifuge Final Product

50 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

High-Content Screens

Automated raw materials mixing

Input materials Mix Incubate Filter Centrifuge Final Product

51 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Sensors

High-Content Screens

Tem

p

Time

Abs

orba

nce

Elution volume

Velo

city

Time Automated raw

materials mixing

Input materials Mix Incubate Filter Centrifuge Final Product

52 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

High-Content Screens

Tem

p

Time

Abs

orba

nce

Elution volume

Velo

city

Time Automated raw

materials mixing

Input materials Mix Incubate Filter Centrifuge Final Product

•  What opportunities exist for intervention, correction? •  Which attributes should be used as features in a model? •  When is the appropriate time to take action?

53 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

•  What opportunities exist for intervention, correction? •  Which attributes should be used as features in a model? •  When is the appropriate time to take action?

High-Content Screens

Tem

p

Time

Abs

orba

nce

Elution volume

Velo

city

Time Automated raw

materials mixing

Input materials Mix Incubate Filter Centrifuge Final Product

>6 months

54 © Copyright 2015 Pivotal. All rights reserved.

Predicting vaccine potency using manufacturing data Model generation and evaluation

Input materials Mix Incubate Filter Centrifuge Final Product

True Potency

Pre

dict

ed P

oten

cy

>6 months

55 © Copyright 2015 Pivotal. All rights reserved.

Predicting vaccine potency using manufacturing data Model generation and evaluation

Input materials Mix Incubate Filter Centrifuge Final Product

True Potency

Pre

dict

ed P

oten

cy

Data Integration

Feature Building Modeling

>6 months

56 © Copyright 2015 Pivotal. All rights reserved.

Predicting vaccine potency using manufacturing data Model generation and evaluation

• Tracing product through pipeline • Integrating manual and

automated data collection • Missing data and outliers

Data Integration

Feature Building Modeling

Input materials Mix Incubate Filter Centrifuge Final Product

True Potency

Pre

dict

ed P

oten

cy

>6 months

57 © Copyright 2015 Pivotal. All rights reserved.

Predicting vaccine potency using manufacturing data Model generation and evaluation

• Extract multiple features from particular steps (duration, mean, median, etc.)

• Considerations •  Tunable vs. measures •  Step in pipeline

Data Integration

Feature Building Modeling

Input materials Mix Incubate Filter Centrifuge Final Product

True Potency

Pre

dict

ed P

oten

cy

>6 months

Tem

p

Time

Abs

orba

nce

Elution volume

Velo

city

Time

58 © Copyright 2015 Pivotal. All rights reserved.

Predicting vaccine potency using manufacturing data Model generation and evaluation

Input materials Mix Incubate Filter Centrifuge Final Product

True Potency

Pre

dict

ed P

oten

cy

Data Integration

Feature Building Modeling

>6 months

• Partial least squares • Random forest • Regularized regression

59 © Copyright 2015 Pivotal. All rights reserved.

Interpreting the utility of a measure obtained during manufacturing based on model outcomes

Building insights from models

�  Some features may reveal tunable parameters to alter potency, others may simply be markers

�  Opportunities to provide real-time feedback on data entry errors and predicted potency outcomes

Assayed value Duration of a step

Pot

ency

Pot

ency

Correlation=0.45 Correlation=0.38

60 © Copyright 2015 Pivotal. All rights reserved.

Critical Considerations for Successful Modeling

Data-driven paradigms, data cleansing and feature engineering

Use Cases

Oil Drilling Vaccine Manufacturing Treating Patients

How to build a predictive model at

scale

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

61 © Copyright 2015 Pivotal. All rights reserved. 61 © Copyright 2013 Pivotal. All rights reserved.

Internet of Things in Healthcare Improving Patient Outcomes and Increasing Efficiency

62 © Copyright 2015 Pivotal. All rights reserved.

Beyond monitor alerts for crashing patients–Prediction means prevention Powering the Connected Hospital

ClinicalNarratives

63 © Copyright 2015 Pivotal. All rights reserved.

Use Cases in Healthcare Building a case for leveraging data and data science within a hospital setting

SAMPLE USE CASES �  Prevent unnecessary ED visits using air quality and patient histories to anticipate needed

prescription refills �  Avoid keeping patients longer than needed due to poor coordination by predicting patient length-of-

stay leading to on-time planning �  Early alerts for deteriorating patients to increase monitoring and improve outcomes �  Prevent discharging patients prematurely through patient readmission models �  Improve treatment pathways via mortality models for sepsis

INFLUENCE CHANGE by finding drivers in the models

IMPROVE customer MODELS using data-driven approaches

LEVERAGE previously inaccessible DATA sources

Environment Approach Insights

64 © Copyright 2015 Pivotal. All rights reserved.

Data & Platform Overview

Pivotal HD

Pivotal HAWQ

DATA PLATFORM

TOOLS

�  Data obtained from EPIC

�  Total unique encounters: 242,312,567

�  Total unique patient IDs: 11,195,934

�  Encounters from 6 healthcare settings (including hospitals, skilled nursing facilities, ambulance and dialysis) –  8 total hospitals used in LOS –  2 regions

�  9 years of data

EPIC

DIAGNOSES PROCEDURES

LABORATORY VALUES

MONITOR FEEDS

BED OCCUPANCY

ORDERS

65 © Copyright 2015 Pivotal. All rights reserved.

Engineering over 300 features to improve models

Simple SQL enables rapid generation of many creative features

•  Processing performed in the database without having to move the data with very simple SQL code

•  Reduced time to generate and examine features enables rapid iterations

•  Test hypotheses rapidly to examine if features have an effect on LOS

Patient Demographics

Patient Medical History

Current Admission

Prior Hospitalizations

ED Stay

Outpatient Utilization

Hospital Attributes

Lab Results (last 72 hrs)

66 © Copyright 2015 Pivotal. All rights reserved.

Understanding drivers of length of stay through model interpretation Model Results and Insights into Patient Outcomes

Data-driven approaches improved model fit by 66%, and predicts patient length of stay in the hospital within 22 hours of true discharge (on average)

Patient history offers less information for AMI Recent observations (from current admission), labs and hospital features are more predictive of length of stay than patient medical history

Current Admission Lab

Medical History Demographics

Hospital None (complete model)

Variance Explained When Category Excluded

Patient Demographics

Patient Medical History

Current Admissio

n

Prior Hospitalizations

ED Stay

Outpatient Utilization

Hospital Attributes

Lab Results (last 72 hrs)

67 © Copyright 2015 Pivotal. All rights reserved.

Insight into hospital operations Length of stay is not only biology. Admission Time, Day of Week, hospital’s size and a hospital’s experience with cardiology matter

Understanding drivers of length of stay through model interpretation Model Results and Insights into Patient Outcomes

Data-driven approaches improved model fit by 66%, and predicts patient length of stay in the hospital within 22 hours of true discharge (on average)

Patient history offers less information for AMI Recent observations (from current admission), labs and hospital features are more predictive of length of stay than patient medical history

Current Admission Lab

Medical History Demographics

Hospital None (complete model)

Variance Explained When Category Excluded Hour of the day

# of

Adm

issi

ons

Hour of the day

# of

Dis

char

ges

68 © Copyright 2015 Pivotal. All rights reserved.

The Promise of Internet of Humans

�  Smart contact lenses and sensors to identify and alert patients before catastrophic events (e.g. blood sugar drop for diabetics)

� Wearables to track patient disease progression using objective measures

�  Track patient adherence �  Detect disease outbreaks using sequencing in

sewer system samples �  ECG monitoring on mobile phones for early

alerting of stroke

69 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

70 © Copyright 2015 Pivotal. All rights reserved.

FOR FURTHER INFO, CHECKOUT…

•  Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data

•  Pivotal Blog @ http://blog.pivotal.io

•  Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal

•  Pivotal Academy @ https://pivotal.biglms.com

•  Or reach out to your local Pivotal Account Executive…