Upload
odsc
View
113
Download
0
Tags:
Embed Size (px)
Citation preview
AGILE DATAChristopher Bergh
Head Chef,
DataKitchen
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_BOSTON 2015
@opendatasci
AGENDA
Who Am I?
What Is The Problem?
A Look At Agile Through Data Lens
How To Do Agile Data In Five Shocking Steps
3K I T C H E NDATA
Algorithm Nerd
Columbia, MIT, NASA-Ames; ATC Automation
Into In 1990
Fuzzy Logic, Neural Networks, Constraint Satisfaction; Unix/C
Software Nerd
CTO, Dir Engineering, VP Product Management
Into In 2000
Management of Software Teams &
Startups; PowerPoint
Data Nerd
COO: ETL Engineers, Analysts & Analytic Tool
Into In 2010
W. Edwards Deming, Data, Bootstrapping;
Excel Hacking
WHO AM I
AGENDA
Who Am I?
What Is The Problem?
A Look At Agile Through Data Lens
How To Do Agile Data In Five Shocking Steps
SO WHAT IS THE PROBLEM?
In one word ….
LOTSATechnologies in Analytics
LOTSAPeople In Analytic Teams
DATA SCIENTIST
REPORTING ANALYST
ETL ENGINEER
DATABASE ARCHITECT
DEV OPS ENGINEERData Governance
LOTSAData & Analysis
ONE OFF
RE
USE
LOTSAMissed Expectations
Analyze
Prepare Data
C
Analyze
Prepare Data
Business Customer Expectation Analyst Reality
Communicate The business does not think that Analysts are preparing data
Analysts don’t want to prepare data
Complexity
Another Field, Software Development, Ran into the Same Problems With Complexity ...
… They Used Something Called ‘Agile’ To Solve The Problem
AGENDA
Who Am I?
What Is The Problem?
A Look At Agile Through Data Lens
How To Do Agile Data In Five Shocking Steps
AGILEMANIFESTO.ORG
5/31/2015 12
AGILEMANIFESTO.ORG
AGILEMANIFESTO.ORG
13
analytics
s/software/analytics/
PRACTICES THAT ARE EASY TO APPLY
Development Sprints
User Stories
Daily Meetings
Defined Roles
Retrospectives
Pair Programming
Burn Down Charts
SOME PRACTICES HAVE BEEN DIFFICULT TO APPLY
Test Driven Development
Branching And Merging
Refactoring
Small Releases
Frequent Or Continuous Integration
Experimentation For Learning
Individual Development Environments
AGILE – WHAT IS UNIQUE TO ANALYTICS?
17
PUT THE
ANALYST AT
THE CENTER
AGILE – WHAT IS UNIQUE TO ANALYTICS?
ANALYICS
PERCIEVED
VALUE DECAY
CURVE
AGENDA
Who Am I?
What Is The Problem?
A Look At Agile Through Data Lens
How To Do Agile Data In Five Shocking Steps
Why? Your work is just code: models, transforms, etc.
Use a source code control system (like GIT) to enable:
Branching
Merging
Diff
5/31/2015 20
1. MANAGE YOUR WORK LIKE CODE
2. TEST AND CONTAIN
1. Create and monitor tests
2. Test on separate data from production
3. Run tests early and often
4. Target 20% of code for tests
5/31/2015 21
Unit Tests & Systems Test … Keep Adding & Improving
1. Break up you work into components
2. Manage the environment for each component (e.g. Docker, AMI)
3. Practice Environment Version Control
3. PROVIDE SEPARATE ENVIRONMENTS FOR ANALYSTS
Why?
Analysts need their data the data to iterate, develop & explore.
5/31/2015 22
4. SUPPORT THREE TYPES OF WORKFLOWS
Small Team
Work directly on production
Feature Branch
Merge back to production branch
Data Governance
3rd party verification before production merge
5/31/2015 23
Review
Test
Approve
5. GIVE ANALYSTS ABILITY TO EDIT DATABASE SAFELY
5/31/2015 24
Best-in-class companies take 12 days
to integrate new data sources into
their analytical systems; industry
average companies take 60 days;
and, laggards average 143 days
Source: Aberdeen Group: Data Management for BI: Fueling the analytical engine with high-octane information
Figure out how to
do this in
minutes
CONCLUSION
CONCLUSION