20
Learn more at datascience.com | Empower Your Data Scientists November 7, 2017 Best Practices: Implementing DataOps with a Data Science Platform

Best Practices: Implementing DataOps with a Data Science Platform

Embed Size (px)

Citation preview

Page 1: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

November 7, 2017

Best Practices:

Implementing DataOps with a Data Science Platform

Page 2: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

• Evolving data science landscape

• Data growth and impacts

• Defining DataOps

• DataOps Vs. DevOps

• Best practices in applying DataOps

• Q&A

Agenda

2

Crystal Valentine

VP Technology Strategy

MapR

[email protected]

William Merchan

CSO

DataScience.com

[email protected]

Page 3: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists 3

EVOLVING LANDSCAPE

Page 4: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

DOING DATA SCIENCE HAS GROWN IN COMPLEXITY

4

Windows OSX Cloud On Prem

Laptops Remote

Environments

Security AWS Google Azure

Notebooks

Jupyter

R Studio

Zeppelin

Languages

Python

Scala

R

SAS

Tools

Libraries

Sharing & Collaboration

?

Results Models

Chat Email

.ppt

Code

Email

Shared

Drives

Deployments

Monitoring Support

Logging

Style A

Logging

Style B

Tools

PMML

Flask

Lineage and Repeatability

?

Data Lake DatabaseData

Inventory

Spark PigHive

Data

ToolsETL

Cron

Users

Page 5: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

DATA SCIENCE TRENDS: GROWING TEAMS & OPEN SOURCE AS THE NEW

STANDARD

5

2017: 2,350,000 data science and analytics job listings*

*Source: Kaggle 2017 data science trend report, Burning Glass Quant Crunch Report, Microsoft Revolutions Blog 2017

Page 6: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

DATA SCIENCE PLATFORMS ARE EMERGING CATEGORY BRINGING TOGETHER ESSENTIAL

ELEMENTS FOR DATA SCIENCE SCALING

6

CLOUD PROVIDERS

ETL & DATA

ENGINEERINGVERTICAL

APPLICATIONS

BI & VISUALIZATION

TOOLS

SECURITY

INFRASTRUCTURE

LIBRARIESTOOLS

DATA PLATFORMS

DATA SCIENCE PLATFORMS

Page 7: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists 7

DATA GROWTH

Page 8: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

DATA IS THE LEVERAGE POINT FOR COMPETITIVE ADVANTAGE

Page 9: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

DATA VOLUMES GROWING FASTER THAN MOORE’S LAW

Source: McKinsey Global Institute

20101987

1.2

Zettabytes

of Data

3

Exabytes

of Data

Data Diversity

2020

44

Zettabytes of Data

EmailsCall Detail

Records

Click

stream

CSV DocumentsData

PDFBilling

Data

Meta

Data

JSON Network

Data

Mobile

Data

XMLProduct

Catalog

Medical

RecordsText Files VideoText

Messages

Merchant

Listings

Sensor

Data

Server

Logs

Set Top

Box

Social

Media

Audio

Page 10: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

THE VALUE OF DATA

Size

$

Valu

e

Cost

Legacy Value Model

Net

ValueSize

$

Valu

e

Next-Gen Value Model

Cost

Net

Value

OPT OPT

Page 11: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

WE HAVE PASSED AN INFLECTION POINT

Legacy technology investmentNext-Gen technology investment

Source: IDC, Gartner; Analysis & Estimates: MapR

Next-gen consists of cloud, big data, software and hardware related expenses

$ (millions)

INVESTMENT IN NEXT-GEN VS. LEGACY TECHNOLOGIES FOR DATA

Total $ growth of IT market

90% of data is on

next-gen

technology by 2020

Page 12: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists 12

DATAOPS

Page 13: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

DATAOPS: AN AGILE METHODOLOGY FOR DATA-DRIVEN ORGANIZATIONS

13

Axioms:

1. Data is central to disruptive enterprise applicationsa. Lightweight, stateless functions do not represent the majority of workloads

2. Data science and machine learning are an important paradigma. Scientists become active users -- no longer just application developers

b. Iterative workflow with different data usage patterns

3. Data volumes continue to grow

4. Moving data is a performance bottleneck

DataOps Goals:

• Continuous model deployment

• Promote repeatability

• Promote productivity -- focus on core competencies

• Promote agility

• Promote self-service

Page 14: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

COMPARING DEVOPS AND DATAOPS: WHAT’S DIFFERENT OR THE SAME?

14

Developers &

Architects

Data Engineers

Data

Scientists

Security &

Governance

Operations

DataOps

DevOps DataOps

Page 15: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

CONTINUOUS MODEL DEPLOYMENT

Data

Engineering

Model

Development

Model

Management

Model

Deployment

Model

Monitoring &

Rescoring

Key Building Blocks for Agility:

1) Unified data platform

2) Data governance

3) Self-service data and compute access

4) Multitenancy and resource management

Page 16: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists 16

BEST PRACTICES

Page 17: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists

INDUSTRY LEADING DATA SCIENCE ORGANIZATIONS ADOPTING DATAOPS

Versioning Platform approach Team makeup and

organization

Self service

Page 18: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists 18

DataOps Platform Checklist

Unified platform for all data --

historical and real-time production

Multitenancy and resource utilization

Single security and access model for

governance and self-service access

Enterprise-grade for mission-critical

applications and open source tools

Run compute on the data platform --

leverage data locality

Page 19: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists 19

Thank you!

Page 20: Best Practices: Implementing DataOps with a Data Science Platform

Learn more at datascience.com | Empower Your Data Scientists 20

NEW DATAOPS APPROACH FOR DATA SCIENCE TEAMS

DataOps