32
Painless Computing Models for Ambitious Data Science Hatef Monajemi, May 18 2018 Symposium on Data Science and Statistics (SDSS18)

Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Painless Computing Models for Ambitious Data Science

Hatef Monajemi, May 18 2018

Symposium on Data Science and Statistics (SDSS18)

Page 2: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Sjddbsep!Nvssj!)V[I*

Wjdupsjb!Tupeefn!)VJVD*

Fsjd!Kpnbt!)VDC*

Qfsdz!Mjbnh!)Tubngpse*

Ebwje!Epnpip!)Tubngpse*

Coauthors

Page 3: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

@stats285

Page 4: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

2012 The world changed

Page 5: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

How to advance knowledge?

use a better mathematical model

experiment until you find a winner

2012

1800

A.I. Apocalypse

Page 6: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

What happened?

Page 7: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

The Great IT Enrichment

“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the rise of modern internet, all of the technology required to transform industries through software finally works and can be widely delivered at a global scale.”

Marc Andreessen, why software is eating the world, WSJ, 2011

Page 8: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

The Great IT Enrichment

Cloud provides millions of servers globally

same-day delivery of 10k-100k of CPU hours

3 cents per CPU hour, 45 cents per GPU hour

Open-source Software and Frameworks galore

High-Speed Internet

“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the rise of modern internet, all of the technology required to transform industries through software finally works and can be widely delivered at a global scale.”

Marc Andreessen, why software is eating the world, WSJ, 2011

Page 9: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Science goes digital

• Traditionally

1. Deduction (Math proofs)

2. Induction (Physical sciences)

Page 10: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Science goes digital

• Emerging new approach

3. Massive Computational Experiments (MCE)

• Traditionally

1. Deduction (Math proofs)

2. Induction (Physical sciences)

Page 11: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

MCE Transforming Science

amount of available compute doubles every 3.5 month

300,000x since 2012

source: OpenAI

Page 12: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

MCE Transforming the world

https://www.datanami.com/2018/05/11/inside-one-vc-firms-hands-on-approach-to-ai/

Page 13: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

MCEs everywhereDeep Learning related

NMT, Tesla, computer vision, etc.

Applied Mathematics

Computer-aided proofs, compressed sensing

Other areas

Protein design, dynamical systems, oil field dev

Psychology (Choosing Prediction Over Explanation in Psychology, Yarkoni 2017)

Page 14: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

IT-enriched Science How does it look like? What are the grand challenges?

Page 15: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Data Science #21stCenturyScience

Massive

Computational

Experiments

Theory

for guidance/interpretation

Page 16: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

1. Conduct MCEs, crush other scientists, win prizes

The grand challenges of #datascience2018

Page 17: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

The grand challenges of #datascience2018

1. Conduct MCEs, crush other scientists, win prizes

2. Enable MCEs, win admiration of other scientists

Page 18: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

The grand challenges of #datascience2018

2. Enable MCEs, win admiration of other scientists

1. Conduct MCEs, crush other scientists, win prizes

Page 19: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Typical Data Science Workflow

1. Precise specification of experiments

2. Distribution and monitoring of all jobs

3. Harvesting data

4. Analysis of data

5. Inductive iterations of 1-4 (suggested/required by 4)

6. Dissemination of acquired knowledge

Page 20: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

How can you do MCEs

Painlessly?

Page 21: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Experiment Management System

1. Systematic structure to coding/experiment definition

2. Automatic access to the cloud/HPC-clusters

3. Automatic harvesting and analysis using defined tools

4. Automatic reproducibility

5. Easy sharing/collaboration/dissemination

(Painless Frameworks for Massive Experiments)

Page 22: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Examples of Painless Framework

Page 23: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

3 models 3 abstractions

Page 24: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Monajemi-Murri Model

ElastiCluster

Page 25: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

MCEs push-button, Literally!

elasticluster start gce

cj parrun train.py gce

1. build personal CPU/GPU cluster (~20 min)

2. Fire up 1000’s of jobs

Page 26: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Stats285 discovers math in the cloud

50 students trained 1500 Deep Nets in one computing day

Each build his/her GPU cluster on Google Cloud

collectively discovered new phenomena in Deep Learning

PNAS paper in progress …

Page 27: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

CodaLab Model

Bundles (Immutable) Worksheets

https://competitions.codalab.org

Page 28: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

CodaLab Model

Bundles (Immutable) Worksheets

More at https://stats285.github.io

Page 29: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Serverless Computing: PyWrenAbstract away server provisioning

short-lived ( < 5 min), small (50 MB/deployment)

function dependencies

(.zip)AWS Lambda

Execute in Container

Page 30: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Serverless Computing: PyWrenAbstract away server provisioning

short-lived ( < 5 min), small (50 MB/deployment)

function dependencies

(.zip)AWS Lambda

Execute in Container

futures = exec.map(function, data) answer = exec.reduce(reduce_func, futures)

PyWren does all the work for you

Page 31: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Host submit

LambdaStart

Setup done,Job start

Results returned

JobDone

Stragglers

More at https://stats285.github.io

Lots of small jobs

Page 32: Painless Computing Models for Ambitious Data Science...“”Six decades into the computer revolution, four decades since the invention of microprocessors, and two decades into the

Conclusion

MCEs can be made painless and transparent through EMS

We are excited to be an enabler of this transformation

MCEs are transforming Science

clusterjob.org pywren.iocodalab.org