27
AGILE DATA Christopher Bergh Head Chef, DataKitchen O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci

Agile Data

  • Upload
    odsc

  • View
    113

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Agile Data

AGILE DATAChristopher Bergh

Head Chef,

DataKitchen

O P E N

D A T A

S C I E N C E

C O N F E R E N C E_BOSTON 2015

@opendatasci

Page 2: Agile Data

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Page 3: Agile Data

3K I T C H E NDATA

Algorithm Nerd

Columbia, MIT, NASA-Ames; ATC Automation

Into In 1990

Fuzzy Logic, Neural Networks, Constraint Satisfaction; Unix/C

Software Nerd

CTO, Dir Engineering, VP Product Management

Into In 2000

Management of Software Teams &

Startups; PowerPoint

Data Nerd

COO: ETL Engineers, Analysts & Analytic Tool

Into In 2010

W. Edwards Deming, Data, Bootstrapping;

Excel Hacking

WHO AM I

Page 4: Agile Data

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Page 5: Agile Data

SO WHAT IS THE PROBLEM?

In one word ….

Page 6: Agile Data

LOTSATechnologies in Analytics

Page 7: Agile Data

LOTSAPeople In Analytic Teams

DATA SCIENTIST

REPORTING ANALYST

ETL ENGINEER

DATABASE ARCHITECT

DEV OPS ENGINEERData Governance

Page 8: Agile Data

LOTSAData & Analysis

ONE OFF

RE

USE

Page 9: Agile Data

LOTSAMissed Expectations

Analyze

Prepare Data

C

Analyze

Prepare Data

Business Customer Expectation Analyst Reality

Communicate The business does not think that Analysts are preparing data

Analysts don’t want to prepare data

Page 10: Agile Data

Complexity

Another Field, Software Development, Ran into the Same Problems With Complexity ...

… They Used Something Called ‘Agile’ To Solve The Problem

Page 11: Agile Data

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Page 12: Agile Data

AGILEMANIFESTO.ORG

5/31/2015 12

AGILEMANIFESTO.ORG

Page 13: Agile Data

AGILEMANIFESTO.ORG

13

analytics

Page 14: Agile Data

s/software/analytics/

Page 15: Agile Data

PRACTICES THAT ARE EASY TO APPLY

Development Sprints

User Stories

Daily Meetings

Defined Roles

Retrospectives

Pair Programming

Burn Down Charts

Page 16: Agile Data

SOME PRACTICES HAVE BEEN DIFFICULT TO APPLY

Test Driven Development

Branching And Merging

Refactoring

Small Releases

Frequent Or Continuous Integration

Experimentation For Learning

Individual Development Environments

Page 17: Agile Data

AGILE – WHAT IS UNIQUE TO ANALYTICS?

17

PUT THE

ANALYST AT

THE CENTER

Page 18: Agile Data

AGILE – WHAT IS UNIQUE TO ANALYTICS?

ANALYICS

PERCIEVED

VALUE DECAY

CURVE

Page 19: Agile Data

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Page 20: Agile Data

Why? Your work is just code: models, transforms, etc.

Use a source code control system (like GIT) to enable:

Branching

Merging

Diff

5/31/2015 20

1. MANAGE YOUR WORK LIKE CODE

Page 21: Agile Data

2. TEST AND CONTAIN

1. Create and monitor tests

2. Test on separate data from production

3. Run tests early and often

4. Target 20% of code for tests

5/31/2015 21

Unit Tests & Systems Test … Keep Adding & Improving

1. Break up you work into components

2. Manage the environment for each component (e.g. Docker, AMI)

3. Practice Environment Version Control

Page 22: Agile Data

3. PROVIDE SEPARATE ENVIRONMENTS FOR ANALYSTS

Why?

Analysts need their data the data to iterate, develop & explore.

5/31/2015 22

Page 23: Agile Data

4. SUPPORT THREE TYPES OF WORKFLOWS

Small Team

Work directly on production

Feature Branch

Merge back to production branch

Data Governance

3rd party verification before production merge

5/31/2015 23

Review

Test

Approve

Page 24: Agile Data

5. GIVE ANALYSTS ABILITY TO EDIT DATABASE SAFELY

5/31/2015 24

Best-in-class companies take 12 days

to integrate new data sources into

their analytical systems; industry

average companies take 60 days;

and, laggards average 143 days

Source: Aberdeen Group: Data Management for BI: Fueling the analytical engine with high-octane information

Figure out how to

do this in

minutes

Page 25: Agile Data

CONCLUSION

Page 26: Agile Data

CONCLUSION

Page 27: Agile Data

AGILE DATA Christopher Bergh

[email protected]

Questions?

Comments?

BOSTON 2015

@opendatasci