44
The Role of the DevOps in the Data Analytics Teams J ON THE BEACH 05/21/16 MORPHED WITH DEEP LEARNING™ TYPICAL OPS GUY (source: Reddit) TYPICAL YOUNG DATA SCIENTIST (source: Common Sense)

The Rise of the DataOps - Dataiku - J On the Beach 2016

  • Upload
    dataiku

  • View
    5.283

  • Download
    4

Embed Size (px)

Citation preview

Page 1: The Rise of the DataOps - Dataiku - J On the Beach 2016

The Role of the DevOps in theData Analytics Teams

J ON THE BEACH05/21/16

MORPHED WITH DEEP LEARNING™

TYPICAL OPS GUY (source: Reddit)

TYPICAL YOUNG DATA SCIENTIST(source: Common Sense)

Page 2: The Rise of the DataOps - Dataiku - J On the Beach 2016

My initial interests

Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collection and Vms

Graph Analytics Chess IA Natural Language Processing 80% Emacs / 20% VIM

Page 3: The Rise of the DataOps - Dataiku - J On the Beach 2016

So to sum it up …

I (USED TO?) TO BE A BIG NERD

Page 4: The Rise of the DataOps - Dataiku - J On the Beach 2016

Collaboration

CLICKERS CODERS

Software is a Human Problem

I ended up buildingA collaborative software

For data science ....

Page 5: The Rise of the DataOps - Dataiku - J On the Beach 2016

DEV OPS&& DATA

Page 6: The Rise of the DataOps - Dataiku - J On the Beach 2016

Let’s get back to the (brief) history of DevOps

Agile Conference, 2008

Scrum, and Agile in an operational context

He!WeshouldhaveourownvelocityinBelgium

10 deploys per day : Dev and Op Operation at Flickr

O’Reilly Velocity, June 2009Patrick Dubois

2007

Dev

Ops

QA

DevOpsDays

Ghent, October 2009

Page 7: The Rise of the DataOps - Dataiku - J On the Beach 2016

DevOps

DevOps is the practice of operations and development

engineers participating together in the entire service lifecycle,

from design through the development

process to production support.

DevOps is also characterized by operations staff making

use many of the same techniques as developers for

their systems work.

Invite Ops to the Dev MeetingOh. And let them SPEAK

Ops should know how to code

Page 8: The Rise of the DataOps - Dataiku - J On the Beach 2016

Let’s take an example: John devops from 2009

Learnt Python the Hard WayStarted with Puppet 1.0

Used EC2 before ELB and EBS !

Page 9: The Rise of the DataOps - Dataiku - J On the Beach 2016

Hegelian perspective

Conflict and FrustrationConcept Combination Catharsis

Create CultureShare

Create Tools

Dev+

Ops

Page 10: The Rise of the DataOps - Dataiku - J On the Beach 2016

There’s been op associated to data for a while ?

It’s called Business Intelligence !

Page 11: The Rise of the DataOps - Dataiku - J On the Beach 2016

History of Data Analytics (Oversimplified)

2013 2014 2015 2016 2017 2018

Moving to a world of automated decision making

DATA FOR MORE INSIGHTS

DATAFOR AUTOMATED DECISIONS

Page 12: The Rise of the DataOps - Dataiku - J On the Beach 2016

The Age Of Distributed Intelligence

Global,PersonalisedandRealTimeDataDrivenServices

Page 13: The Rise of the DataOps - Dataiku - J On the Beach 2016

Data, Analytics and Data Science

Conflict and FrustrationConcept Combination Catharsis

Create CultureShare

Create Tools

Data+

Science

Page 14: The Rise of the DataOps - Dataiku - J On the Beach 2016

Welcome to Technoslavia !

Page 15: The Rise of the DataOps - Dataiku - J On the Beach 2016

Classic Business Intelligence Team Organization

Business Leader Data Consumer

Line-of-business Data Consumer Business Project

Sponsor

BI Solution Architect

Model Designer

ETL Developer

Dashboard / Report Designer

SpecsDim

Big Boss

Page 16: The Rise of the DataOps - Dataiku - J On the Beach 2016

Data Science Team Organization

Business Leader Data Consumer

Line-of-business Data Consumer

Business ProjectSponsor

Data Engineer

Data Analyst

System Engineer / Data Architect

Business Needs

Data Scientist

ITConstraints

I.T.

Page 17: The Rise of the DataOps - Dataiku - J On the Beach 2016

Is there room for a new role ?

Data Plumberer

DataEngineer

Data Scientist

Data Waiter

DataCleaner

DataAnalyst

REALJOB

DREAMJOB

DevOps For Data?

Page 18: The Rise of the DataOps - Dataiku - J On the Beach 2016

Imaginea company building

a new ”smart car” app: AutoFine™

”Revolutionary Collaborative network that check the quality of your driving and punishYou with virtual fines if you’re a bad driver”

Page 19: The Rise of the DataOps - Dataiku - J On the Beach 2016

Imaginea company building

a new ”smart car” service AutoFine™

10 TB of Data Every Month

Hive / Spark / Python

10 Different Predictive Models

Real-Time API / Workflow

Page 20: The Rise of the DataOps - Dataiku - J On the Beach 2016

????

????

OPERATIONS : Whose is responsible for …

Check that the newly trained model perform as

expected

Check that the product catalog and the website tags remain consistent

Check that the Hadoop cluster scales as expected and as enough bandwidth to handle the workload

Test the performance for the real-time API

Monitor the performance of the model and decide to

rollback / maintain / rollout

Page 21: The Rise of the DataOps - Dataiku - J On the Beach 2016
Page 22: The Rise of the DataOps - Dataiku - J On the Beach 2016

DATA OPSAs a Philosophy

Page 23: The Rise of the DataOps - Dataiku - J On the Beach 2016

X OPS PHILOSOPHY

Highly consensual

Highly controversial

Page 24: The Rise of the DataOps - Dataiku - J On the Beach 2016

Create an API culture

Do not shareo Random Piece of Codeo Flat Fileo Email

Do shareü Reproductible documented workflowsü Clean, documented APIs

Page 25: The Rise of the DataOps - Dataiku - J On the Beach 2016

Defensive Data Programming

•Software has errors.•You are not your software, yet you are are responsible for the errors.•You can never remove the errors, only reduce their probability.

Page 26: The Rise of the DataOps - Dataiku - J On the Beach 2016

Defensive Data Programming

•Handle the case when one of the input file is empty•Handle the case when a new value appear •Handle the case when two columns become completely correlated•Handle the case when a column is 16k long •Etc.. Etc. etc…

Page 27: The Rise of the DataOps - Dataiku - J On the Beach 2016

Monitoring : the alerts for people who love it

• Performance ….• Time Spent … • Number of Errors …

Page 28: The Rise of the DataOps - Dataiku - J On the Beach 2016

Monitoring : Business Informal Monitoring

• % Opening • Market Spent • Exception User Events …

Page 29: The Rise of the DataOps - Dataiku - J On the Beach 2016

Resource Allocation

I’ve got this strangeError ”OutOfMemory” . Do you know what it is

?

Why is the Hadoop Cluster going slower than my laptop ?

Page 30: The Rise of the DataOps - Dataiku - J On the Beach 2016

The Philosophy of pre-allocating more resources than necessary

Page 31: The Rise of the DataOps - Dataiku - J On the Beach 2016

Get to the latest package culture …

Data Scientist

I need the latest version of scikitAnd networkX ….

And coud you repackage that To enable TensorFlow optimizations ?

System Administrator

…..

Page 32: The Rise of the DataOps - Dataiku - J On the Beach 2016

The culture of containers

Developers’ Sandbox

Page 33: The Rise of the DataOps - Dataiku - J On the Beach 2016

DATA OPSAs a Job Title

Page 34: The Rise of the DataOps - Dataiku - J On the Beach 2016

Job Title : a matter of name, $$ and social ladder

Data scientist Data Ops

Developer

Statistician

Full Stack Developer

Sys Admin

DevOps

Page 35: The Rise of the DataOps - Dataiku - J On the Beach 2016

Job Role : A matter of Do or Don’t

DO DON’TThings you really want to do Things you really don’t want to get into

Page 36: The Rise of the DataOps - Dataiku - J On the Beach 2016

FIGHT THE TOY PLATFORM ANTI-PATTERN

Test and Invest in Infrastructure == Skilled Peopleor

Go For Cloud / Packaged Infrastructure

YourBrandNewHadoopClusterisperceivedasslow,notsousedandnotreliable

Page 37: The Rise of the DataOps - Dataiku - J On the Beach 2016

FIGHT THE TECHNO MISMATCH ANTI-PATTERN

Assume Being Polyglotor

Be a Dictator

VS

VS

ThePythonClan

TheRTribe

TheOldElephantFraternity

TheNewElephantClub

Page 38: The Rise of the DataOps - Dataiku - J On the Beach 2016

GETTING DATA POLITICS

> DATA NOT AVAILABLE

Page 39: The Rise of the DataOps - Dataiku - J On the Beach 2016

GETTING DATA POLITICS THEFOX

Hunt for Big Problem!

Convince the CEO that you can Solve a Business Critical problem And use it as an excuse to get allThe data you want !

THESPIDER

Create Network !

Create a set of trackers or Addictive Data Collection internallyTo get Data on your side !

Page 40: The Rise of the DataOps - Dataiku - J On the Beach 2016

PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY

Website2000’winners

Companiesthatwereabletorelease fast

"ArtificialIntelligencewithDataforInternetofThings"2010’winners

Companiesabletoputintelligenceinproduction

?

Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION

Page 41: The Rise of the DataOps - Dataiku - J On the Beach 2016

OWN ANONYMISATION / PRIVACY / DATA SECURITY WITH PARTNERS ISSUES

Technical Feasibility ? What can or cannot be done ?

Page 42: The Rise of the DataOps - Dataiku - J On the Beach 2016

Let’s Wrap IT Up ! A Company Building a GPS powered automated car fine system

10 TB of Data Every Month

Hive / Spark / Python

10 Different Predictive Models

Real-Time API / Workflow

Robust Workflow

With Data Quality

Checks

Functional MonitoringBy Business

People through

Slack and Dashboards

Monitoring for the API

Feature Engineering Pipeline in

Python

Page 43: The Rise of the DataOps - Dataiku - J On the Beach 2016

But you where do you stand ?

???? ???? ???? ?????

What's your roll-back strategy like?

What kind of multi-variate testing or strategies do you have in place for predictive models?

How do you manage the robustness of your data flow production scripts?

How can business people monitor the performance of the application?

Page 44: The Rise of the DataOps - Dataiku - J On the Beach 2016

http://bit.ly/production-survey

Food forthoughtswww.dataiku.com/blog

THANKYOU!http://bit.ly/production-survey http://bit.ly/production-survey