76
Agile Data Warehouse The Final Frontier Terry Bunio

The final frontier

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: The final frontier

Agile Data Warehouse

The Final Frontier

Terry Bunio

Page 2: The final frontier

Agile Data Warehouse The Final Frontier

Page 3: The final frontier

@tbunio

bornagainagilist.wordpress.com

www.protegra.com

Terry Bunio

Page 4: The final frontier

The Prime Directive

Captain, we need more visualization!

The United Federation of Planets

3

4

5

Spectre of the Agility 2

Data Warehouse Architecture 1

Page 5: The final frontier

Data Warehouse

• Definition

– “a database used for reporting and data analysis.

It is a central repository of data which is created

by integrating data from multiple disparate

sources. Data warehouses store current as well

as historical data and are commonly used for

creating trending reports for senior management

reporting such as annual and quarterly

comparisons.” – Wikipedia.org

Page 6: The final frontier

Data Warehouse

• Can refer to:

– Reporting Databases

– Operational Data Stores

– Data Marts

– Enterprise Data Warehouse

– Cubes

– Excel?

– Others

Page 7: The final frontier

Relational

• Relational Analysis

– Third Normal Form

– OLTP

– Normalized tables optimized for modification

Page 8: The final frontier

Dimensional

• Dimensional Analysis

– Star Schema

– OLAP

– Facts and Dimensions optimized for retrieval

• Facts – Business events – Transactions

• Dimensions – context for Transactions

– Accounts

– Products

– Date

Page 9: The final frontier

Relational

Page 10: The final frontier

Dimensional

Page 11: The final frontier
Page 12: The final frontier

Kimball-lytes

• Bottom-up - incremental

– Operational systems feed the Data

Warehouse

– Data Warehouse is a corporate dimensional

model that Data Marts are sourced from

– Data Warehouse is the consolidation of Data

Marts

– Sometimes the Data Warehouse is generated

from Subject area Data Marts

Page 13: The final frontier

Inmon-ians

• Top-down

– Corporate Information Factory

– Operational systems feed the Data

Warehouse

– Enterprise Data Warehouse is a corporate

relational model that Data Marts are sourced

from

– Enterprise Data Warehouse is the source of

Data Marts

Page 14: The final frontier

The gist…

• Kimball‟s approach is easier to implement as

you are dealing with separate subject areas,

but can be a nightmare to integrate

• Inmon‟s approach has more upfront effort to

avoid these consistency problems, but takes

longer to implement.

Page 15: The final frontier

Spectre of the Agility

Page 16: The final frontier

Incremental - Kimball

•In Segments

•Detailed Analysis •Development

•Deploy

•Long Feedback loop •Considerable changes

•Rework •Defects

Waterfall - Inmon •Detailed Analysis •Large Development

•Large Deploy

•Long Feedback loop •Extensive changes

•Many Defects

Data Warehouse

Project

Page 17: The final frontier

Popular Agile Data Warehouse Pattern

• Analyze data requirements department by

department

• Create Reports and Facts and Dimensions

for each

• Integrate when you do subsequent

departments

Page 18: The final frontier
Page 19: The final frontier

The problem

• Conforming Dimensions

– A Dimension conforms when it is in

equivalent structure and content

– Is an account defined by Marketing the same

as Finance?

• Probably not

– If the Dimensions do not conform, this severly

hampers the Data Warehouse

Page 20: The final frontier

Where is she?

Page 21: The final frontier

Where is the true Agility?

• Iterations not Increments

• Brutal Visibility/Visualization

• Short Feedback loops

• Just enough requirements

• Working on enterprise priorities – not just for

an individual department

Page 22: The final frontier

Our Mission

• “Data... the Final Frontier. These are the

continuing voyages of the starship Agile.

Her on-going mission: to explore strange

new projects, to seek out new value and

new clients, to iteratively go where no

projects have gone before.”

Page 23: The final frontier

The Prime Directive

Page 24: The final frontier
Page 25: The final frontier
Page 26: The final frontier

The Prime Directive

• Is a vision or philosophy that binds the

actions of Starfleet

• Can an Data Warehouse project truly be

Agile without a Vision of either the Business

Domain or Data Domain?

– Essentially it is then just an Ad Hoc Data

Warehouse. Separate components that may fit

together.

– How do we ensure we are working on the right

priorities for the entire enterprise?

Page 27: The final frontier

A new model

Page 28: The final frontier

Agile Enterprise Data Model

• Confirms the major entities and the

relationships between them

– 30-50 entities

• Confirms the Business and Data Domains

• Starts the definition of a Data Model that will

be refined over time

– Completed in 1 – 4 weeks

Page 29: The final frontier

An Agile Enterprise Data Model

• Is just enough to understand the domain so

that the iterations can proceed

• Is not mapping all the attributes

• Is not BDUF

• Is a User Story Map for the Data Domain

• Contains placeholders for refinement

Page 30: The final frontier

Agile Enterprise Data Model is a Data Map

Page 31: The final frontier

Agile Enterprise Data Model

• Is

– Our vision

– Our User Story Map for the Data Domain

– Guides our solution

– Our Prime Directive

– A Data Model

Page 32: The final frontier

Kimball or Inmon?

Page 33: The final frontier

Spock

• Hybrid approach

– It is only logical

– Needs of the many outweigh the needs of the

few – or the one

Page 34: The final frontier

Spock Approach

Agile Enterprise

Data Model

Page 35: The final frontier

Spock Approach

• Agile Enterprise Data Model

• Operational Data Store

• Dimensional Data Warehouse

• Reporting can then be done from:

– Operational Data Store

– Dimensional Data Warehouse

– New Data Marts

Page 36: The final frontier

Benefits of Spock Approach

• Agile Enterprise Data Model

– Validates knowledge of Data Domain

– Ensure later increments don’t uncover data

that was previously unknown and hard to

integrate

• Minimizes rework

– True iterations

• Confirm at high level and then refine

Page 37: The final frontier

Benefits of Spock Approach

• Operational Data Store

– Model data relationally to provide enterprise

level operational reports

– Consolidate and cleanse data before it is

visible to end-users

– Is used to refine the Agile Enterprise Data

Model

Page 38: The final frontier

Benefits of Spock Approach

• Dimensional Data Warehouse

– Model data dimensionally to validate domain

– Able to provide data to analytical reports

– Able to provide full historical data and context

for reports

– Able to provide clients with the ability to

generate their own reports easily

Page 39: The final frontier

How do we work iteratively on

a Data Warehouse?

Page 40: The final frontier

Increments versus iterations

• Increments

– Series by series – department by department

• Iterations

– Story by story – episode by episode

• Enterprise prioritization

– Work on the highest priority for the enterprise

– Not just within each series/department

Page 41: The final frontier

Captain, we need more Visualization!

Page 42: The final frontier

His pattern indicates 2 dimensional thinking

Page 43: The final frontier

Data visualization

Page 44: The final frontier

Data Visualization

• Is required to:

– Provide a visual data backlog

– Provide a visualization of the data

requirements across dimensions

– Plan iterations

– Lead into the creation of FACT tables and

Dimensions in a Data Warehouse

Page 45: The final frontier

• For an Agile Data Warehouse we must think

and visualize in more dimensions

• We must create a User Story Map for Data

Requirements that is both intuitive and

informing

– Primarily for clients!

– They should be able to look at it and

understand what is currently being worked on

Page 46: The final frontier

We need a bigger metaphor

Page 47: The final frontier

Invoices

Sales

Operations

Master

Data

Payments

Bills

Transactions

Data Hexes

Page 48: The final frontier

Bill Payment Hex

• Can have up to six dimensions of how

payments are sliced, diced, and aggregated

• Concentric hexes allow for the planning of

iterations

Page 49: The final frontier
Page 50: The final frontier
Page 51: The final frontier

Cardassian Union

Page 52: The final frontier

Be careful how you spell that…

Page 53: The final frontier

Data Modeling Union

• For too long the Data Modelers have not

been integrated with Software Developers

• Data Modelers have been like the

Cardassian Union, not integrated with the

Federation

Page 54: The final frontier

Issues

• This has led to:

– Holy wars

– Each side expecting the other to follow their

schedule

– Lack of communication and collaboration

• Data Modelers need to join the „United

Federation of Projects‟

Page 55: The final frontier

Tools of the trade

Page 56: The final frontier

Version Control

Page 57: The final frontier

Version Control

• If you don‟t control versions, they will control

you

• Data Models must become integrated with

the source control of the project

– In the same repository of project trunk and

branches

Page 58: The final frontier

Our Version Experience

• We are using Subversion

• We are using Oracle Data Modeler as our

Modeling tool.

– It has very good integration with Subversion

– Our DBMS is SQL Server

• Unlike other modeling tools, the data model

was able to be integrated in Subversion with

the rest of the project

Page 59: The final frontier

Shameless plug

• Free

• Subversion Integration

• Supports Logical, Relational, and

Dimensional data models

• Since it is free, the data models can be

shared and refined by all 60+ members of

the development team

• Currently on version 889

Page 60: The final frontier

Adaptability

Page 61: The final frontier

Change Tolerant Data Model

• Only add tables and columns when they are

absolutely required

• Leverage Data Domains so that attributes

are created consistently and can be changed

in unison

Page 62: The final frontier

Change Tolerant Data Model

• Don‟t model the data according to the

application‟s Object Model

• Don‟t model the data according to source

systems

• These items will change more frequently

than the actual data structure

• Your Data Model and Object Model should

be different!

Page 63: The final frontier

Re-Factoring – Read It

Page 64: The final frontier

Create the plan for how you

will re-factor

Page 65: The final frontier

Plan for:

• Versioning – Major and minor

• Adaptability – How the data model will adapt

to major changes

• Refinement – How will iterations be planned

and executed

• Re-Factoring – How will the data design be

re-factored

Page 66: The final frontier

Assimilate

Page 67: The final frontier

Assimilate

• Assimilate Version Control, Adaptability,

Refinement, and Re-Factoring into core

project activities

– Stand ups

– Continuous Integration

– Check outs and Check Ins

• Make them part of the standard activities –

not something on the side

Page 68: The final frontier

Our experience

Page 69: The final frontier

Current Stardate

• We are reaching the end of Operational Data

Store ETL Development

– ODS has been refined as we progress – no

major changes

– Data Warehouse Dimensional Model is also

complete

– Initial Reports analysis is complete and report

backlog will soon be started on

Page 70: The final frontier

Summary

• Use an Agile Enterprise Data Model to

provide the vision

• Strive for Iterations over Increments – Spock

Approach assists in this

• Use Data Hexes to provide brutal visibility

and Iteration planning

• Plan and Integrate processes for Versioning,

Adaptability, Refinement, and Re-Factoring

Page 71: The final frontier

What doesn‟t change?

Page 72: The final frontier

Leadership

Page 73: The final frontier

Leadership

• “If you want to build a ship, don't drum up

people together to collect wood and don't

assign them tasks and work, but rather teach

them to long for the endless immensity of the

sea.” ~ Antoine de Saint-Exupery

Page 74: The final frontier

Leadership • “[A goalie's] job is to stop pucks, ... Well, yeah, that's

part of it. But you know what else it is? ... You're

trying to deliver a message to your team that things

are OK back here. This end of the ice is pretty well

cared for. You take it now and go. Go! Feel the

freedom you need in order to be that dynamic,

creative, offensive player and go out and score. ...

That was my job. And it was to try to deliver a

feeling.” ~ Ken Dryden

Page 75: The final frontier
Page 76: The final frontier

Agile Data Warehouse

The Final Frontier

Terry Bunio

@tbunio bornagainagilist.wordpress.com www.protegra.com