33
How to Converge on a "Single Version of the Truth" 15-April-2013 www.xlntconsulting.com 1 Data Governance: How to Converge Stakeholders Throughout the Enterprise Towards a “Single Version of the Truth” Tom Breur [email protected] IRM DG & MDM Conference Europe London, 15 April 2013

XLNT Consulting - How to Converge Stakeholders … · Data Governance - How to Design, Deploy, and Sustain an Effective Data Governance Program John Ladley, 2012 DG definitions (1)

Embed Size (px)

Citation preview

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 1

Data Governance: How to Converge Stakeholders

Throughout the Enterprise Towards a “Single Version of the Truth”

Tom Breur [email protected]

IRM DG & MDM Conference Europe London, 15 April 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 2

Client list

3  © Tom Breur, 2013

4

Agenda  14:00-14:45

Data governance & data integration  14:45-15:30

Data quality & change management  15:30-15:45 tea/coffee break  15:45-16:30

Elusive requirements & Agile BI  16:30-17:15

Hyper normalized DWH Architecture © Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 3

Part 1

Data Governance and the Semantic Gap

© Tom Breur, 2013 5

It’s a stretch…  Volumes of data are growing (fast)!  Variety of sources keeps expanding:

 Social media, RFID, log-files, GPS, etc.  Business users need their data (much)

sooner:  monthly ⇒ weekly ⇒ daily ⇒ intra-day

 BI in support of operational processes, calls for (near) real-time data

6 © Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 4

7

Information gap  BI teams (often) find themselves straddled  BI imperative: bridge the information gap

© Tom Breur, 2013

Available time to apply information ⇒ shorter

Required time to gather information

⇒ longer

information gap

Incremental data value Insight

(conformed dimensions)

Information (integrated around

Business Keys)

Data (source system facts)

© Tom Breur, 2013 8

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 5

9

Semantic gap  BI “carries” information: source ⇒ target  Transformations (ETL) change ‘useless’

data (source system facts) into meaningful information (corporate reporting)  How to perform these transformations

“correctly” is non-trivial; getting requirements “just right” is near impossible (beforehand!)

 Business stakeholders specify interpretation, BI teams execute

© Tom Breur, 2013

10

Why data governance? Management paradox:

encouraging and leveraging the ingenuity of everyone throughout the enterprise,

while ensuring compliance with overall corporate vision and principles,

through accurate and consistent information provision

© Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 6

DG ≠ Information mgt

© Tom Breur, 2013 11

Data

/Info

Man

agem

ent –

man

agin

g da

ta to

ach

ieve

goa

ls

Governance – M

aking sure that

information is m

anaged properly Data

Information, and Content Life Cycles

DG Oversight & Direction

DG Accountability; Ownership, issue

Resolution

DG Responsibility; Stewardship,

Monitoring Source: Data Governance - How to Design, Deploy, and Sustain an Effective Data Governance Program John Ladley, 2012

DG definitions (1)  DMBOK:

“Data governance is the exercise of authority, control, and shared decision making (planning, monitoring and enforcement) over the management of data assets”

© Tom Breur, 2013 12

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 7

DG definitions (2)  Ladley (2012):

“Data governance is the organization and implementation of policies, procedures, structure, roles, and responsibilities which outline and enforce rules of engagement, decision rights, and accountabilities for the effective management of information assets”

© Tom Breur, 2013 13

DG definitions (3)  Gwen Thomas:

“Data governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods”

© Tom Breur, 2013 14

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 8

IT Governance  Weill & Ross (2004):

“IT governance is specifying the decision rights and accountability framework to encourage desirable behavior in the use of IT”

 Replace “IT” with “data” and this same definition holds for DG as well

© Tom Breur, 2013 15

Why define data governance?  Any chosen definition drives the

boundaries for how you can manage your DG program  Formally “choosing” your DG definition is a

significant alignment step, during the launching phase of your program

 Your accountability/decision structure drives the future of DG, and determines sustainability of your change effort

© Tom Breur, 2013 16

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 9

Accountability framework  Who has “a say”

e.g.: should be consulted, and/or allowed to provide input

 Who “decides” (& how) e.g.: how are decisions made, what does the decision mechanism look like

© Tom Breur, 2013 17

Origins of DG (1)  Data Quality projects/initiatives

 DQ issues are the root cause for many (if not most) BI problems

 Remediating DQ issues is a prominent driver for DG programs

 MDM ensures accurate reference data  Many seek a (one) “Golden copy”, usually

originating in some central place (per entity/context)

© Tom Breur, 2013 18

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 10

Origins of DG (2)  BI/Data Integration  Two types of issues (grossly) or

challenges:  DQ issues caused by poor data entry, or

incorrect data capture (“DQ errors”)  DQ issues caused by imperfect specification

(“requirements”) of semantic representation (improper transformation, ETL)

© Tom Breur, 2013 19

⇒ This is where the ‘struggle’ to create a “Single version of the truth” takes place!

Two types of DQ challenges  Organizational:

 (Slightly) different definitions for seemingly similar business entities (e.g.: “customer”)

 Misalignment of performance objectives  Conforming dimensions is –essentially– a

political process  Technical:

 Traditional DWH architectures do not support DQ resolution and governance very well

20

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 11

“Organizational” challenges (1)  (Slightly) different definitions for seemingly

similar business entities (e.g.: “customer”)  MDM programs are never fully completed,

and rarely seem to come to fruition  Common terminology and definitions across

the company are (exceedingly) rare  Finding integration points (Business Keys) is

exceedingly hard  BI teams do not “own” the semantic gap

between source systems & data interpretation 21

“Organizational” challenges (2)  Misalignment of performance objectives  Stakeholders across the business, receive

irreconcilable targets  Distinct business processes pursue “locally

optimal” targets (and use corresponding definitions)

 Hence: their reporting needs, and ‘outlook’ on (interpretation!) corporate facts will differ

22

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 12

“Organizational” challenges (3)  Conforming dimensions is –essentially– a

political process  Conforming dimensions requires (exactly)

matching entity definitions  Without globally agreed upon dimensions, no

“single version of the truth” is ever possible  Dimension definition impacts (“semantic”)

interpretation of source system facts  BI teams rather than “wait & receive”, can

“drive & inform” to close these gaps 23

“Technical” challenges (1)  Traditional DWH architectures do not support

DQ resolution and governance very well  Data warehousing needs to bridge the

“semantic gap” between source system data, and corporate reporting  ‘Inverse reporting’ of “semantic gap” (Big T)

requires omnipresent traceability and (immediate) backward lineage

 Bridging the “semantic gap” requires insight in (all!) incumbent business rules

24

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 13

“Technical” challenges (2)  Traditional DWH architectures do not ‘work’ very

well with Agile methodologies  Big Data is increasing the “pressure to

deliver”: a range from OLAP/Dimensional to Self-Service functionality is required  Teams cannot do “everything” for “everybody”  (early releases of) Data need to be made

available sooner  Traditional (Inmon & Kimball) modeling

approaches do not support agility! 25

“Technical” challenges (3)  Data warehousing needs to embrace generation

of code & data virtualization  (required) Lower data latency is increasing

the “pressure to deliver”  DWH Automation can enable faster delivery  (early releases of) Data need to be made

available sooner, not always necessarily in dimensional format (yet)

26

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 14

“Technical” challenges (4)  Traditional DWH architectures do not support

DQ resolution and governance very well  Inflexible (20th century) data modeling

paradigms dictate (too) early commitments  Agile recommends “delaying decisions to the

last responsible moment”  Only when data (in ‘some’ form) are available

can conforming dimensions become an informed and democratic process

27

DG & change (1)  Ladley (2012):

“Data governance is … a long-term commitment to doing business differently”

 Data governance is about: changing business processes, by changing people’s behavior, through the only leverage point we can truly influence: “me”

© Tom Breur, 2013 28

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 15

Change framework (1)  Choice of change tactics should be driven

by prevailing conditions (quadrant I-IV)

© Tom Breur, 2013 29

Source: Data Quality Happyland – How to Get There Tom Breur, 2009

Change framework (2)  Unaware ⇒ aware

 Inform the business about costs of non-quality, missed opportunities ($), etc.

 Incompetent ⇒ competent  Train staff, and/or provide better tools and

infrastructure  Aware ⇒ unaware

 Change accountabilities, to “build quality into the organization” (make it default)

© Tom Breur, 2013 30

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 16

Concluding remarks Part 1  “Big Data” are here to stay

(and lets hope the hype passes soon)  We face an information & semantic gap  Data Governance implies separating

data/information management from DG  DG is a business alignment effort  DG implies “change”; to make it ‘stick’, you

need to inform, empower & redefine accountabilities

31 © Tom Breur, 2013

Coffee/tea break 15:30-15:45

© Tom Breur, 2013 32

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 17

Part 2 An Architecture in

Support of Data Governance

© Tom Breur, 2013 33

BI Requirements (1)

© Tom Breur, 2013 34

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 18

BI: means and ends uncertainty Means uncertainty   How do we get there?   Lack of “design patterns”   Data integration fraught

with data quality issues   Lack of Master Data

Management   Lack of Meta Data   No agreement on how to

conform dimensions

Ends uncertainty   Where are we going to?   Requirements are difficult

to pin down   Diverse end-user groups   Ambiguous business

case(s)   Scope is unclear   Data warehouses are

never “done”

© Tom Breur, 2013 35

Source: Agile & BI – a marriage made in heaven? Tom Breur, 2011

BI Requirements (2)

© Tom Breur, 2013 36

Requirements are hard! Let’s go shopping

Two possible alternatives for structured requirements gathering:

BEAM★ -or- ADAPT

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 19

Why go “Agile”? (1)  BI projects fail too often, or don’t live up to

expectations  Increasingly, BI development takes place

alongside (instead of after) application engineering

37 © Tom Breur, 2013

Why go “Agile”? (2) Winston Royce (1970):

38 © Tom Breur, 2013

Release Test

Development Design

Analysis

“In my experience, the simpler model … [as pictured below] has never worked on large

software development efforts”

[Royce subsequently went on to describe an enhanced model, which included building a prototype first and then using the prototype plus feedback

between phases to build a final deployment]

Source: Managing the Development of Large Software Systems Winston Royce, 1970

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 20

Waterfall ⇔ Agile

© Tom Breur, 2013 39

Waterfall/Traditional Agile

Plan Driven

Value Driven

Requirements Resources Date

Resources Date Requirements

Agile fixes the date and resources and varies the scope

Fixed

Estimated

Source: Agile Software Requirements Dean Leffingwell, 2011

Quick & Dirty ≠ Agile (1)  www.agilemanifesto.org (principle #1):

 Creating “technical debt” stands squarely in the way of continuous delivery, and maintaining a so-called “sustainable pace”: it creates (new) legacy!

“Our highest priority is to satisfy the customer through early and continuous

delivery of valuable software” [emphasis added TB]

40 © Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 21

Quick & Dirty ≠ Agile (2)

41 © Tom Breur, 2013

BI requirements (3)  Information products (and changes)

trigger new/more change requests:

new data ⇒ insights ⇒ new requirements

 Gerald M. (Jerry) Weinberg:

“Without stable requirements, development can’t stabilize, either”

42 © Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 22

Slide 43

Inmon ⇔ Kimball (1) 3-tiered 2-tiered

© Tom Breur, 2013

Inmon ⇔ Kimball (2) Problems with Inmon   Uncovering the ‘correct’

3NF model requires scarce business expertise

  Unclear where 3NF model boundaries begin and end

  Model redesigns trigger a cascading nightmare of parent-child key updates

Problems with Kimball   Smallest unit of delivery

is a Star   Incremental growth adds

prohibitive overhead

  Dimensional structure is very rigid → not conducive to expansion or change

  Conforming dimensions is hard, in particular without access to data

© Tom Breur, 2013 44

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 23

3NF ⇔ Dimensional (1)

© Tom Breur, 2013 45

Source: Adapting Data Warehouse Architecture to Benefit from Agile Methodologies Badari Boyine & Tom Breur, 2013

3NF ⇔ Dimensional (2)

© Tom Breur, 2013 46

see: Kimball design tip # 149 http://www.kimballgroup.com/2012/10/02/design-tip-149-facing-the-re-keying-crisis/

this problem gets (much!) worse with multiple parent-child levels

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 24

3NF ⇔ Dimensional (3)

© Tom Breur, 2013 47

Adding new dimension, and changing granularity,

requires doing the initial load again (because of the way surrogate keys are assigned)

Hyper normalized model

© Tom Breur, 2013 48

business keys, context attributes (history), and relations, all have their own tables

appending “Supplier data” to the model (or any other new source), is guaranteed to be contained as a “local” problem (=extension) in the data model because business keys, context attributes (history), and relations all have their own tables

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 25

Source: Business Intelligence Architecture in Support of Data Quality Tom Breur, 2013 3-tiered BI architecture

Legacy

OLTP

ERP

LOG files

External

ETL Staging

Area

Data Warehouse

ODS

Datamart 1

Datamart 2

Datamart n

Business Intelligence Applications

Metadata

dimensional 49 © Tom Breur, 2013 3NF hyper

normalized

BI value stream

structural transformation

semantic interpretation

meaningful (& accurate) reporting

© Tom Breur, 2013 50

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 26

Business rules downstream  Data in central hub (should) represent

“true” state of affairs in source system(s)  Load all data, all the time:

the good, the bad, and the ugly  Data in corporate reports are the “sum” of

transformation & interpretation  ⇒ avoid confounding of “errors”

 Auditability of data is rarely an explicit requirement, but always implicitly (trust!)

© Tom Breur, 2013 51

Horses for courses (1)  3NF

 quickly & accurately capture transaction data  easy to get data in

 Hyper normalized  integrate historical data  capture all data, all the time

 Dimensional  present & analyze data  easy to get data out

© Tom Breur, 2013 52

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 27

Horses for courses (2)

•  consistency •  rigid model

3NF •  validity •  flexibility

Hyper normalized

• meaning •  ease of use

Dimensional

© Tom Breur, 2013 53

Legacy

OLTP

ERP

LOG files

External

ETL Staging

Area

Data Warehouse

ODS

Datamart 1

Datamart 2

Datamart n

Business Intelligence Applications

Metadata Back room Data Warehouse Architecture

Front room Business Intelligence Architecture

Backroom ⇔ Frontroom

54 © Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 28

Backroom ⇔ Frontroom Lean Manufacturing   “Push” metaphor   Focus on control &

stability   Standardization  → Change management   Persistent storage   Hyper normalized   Restricted access   Change is an exception

Lean Product Development   “Pull” metaphor   Focus on responsiveness

& agility   Flexibility  → Self-service   Volatile/Virtual   Dimensional   Free-for-all   Change is the norm

© Tom Breur, 2013 55

Why data virtualization?  Operational BI calls for (near) real-time data  Data virtualization enables federation ⇒ you can delay (definitive) modeling, yet make data available (very) early on  Adjusted deployments can be released almost

immediately  Piecemeal adjusting (“tweaking”) of

misaligned Dimensions enables informed discussions ⇒ consequences (target setting)

56 © Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 29

Divide & Conquer (1)  “Break down” semantic gap from

Backroom to Frontroom  Offer a range of data (self-) services:

 Source data “as is”  Source data that have undergone cleansing  Dimensional models  Full-fledge BI applications

 Allow business to set (development) priorities!

57 © Tom Breur, 2013

Divide & Conquer (2) Transform (multiple) disparate (technical)

source keys to meaningful Business

Keys (aka “Business Vault”)

Quickly model data (based on PK & FK) in

hyper normalized structures for exploration

and discovery (aka “Source Vault”)

© Tom Breur, 2013 58

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 30

Decomposition & synthesis  Source elements are first decomposed,

then “reassembled” (usually dimensional)

 Integration around business keys (orange), ⇒ identifiers for end-users

59 © Tom Breur, 2013

Source: Modeling the Agile Data Warehouse with Data Vault Hans Hultgren, 2012

Entity ⇒ Ensemble  3NF Entity contains:

 Business Key  Relations  Descriptive attributes (=context & history)

 All broken out in separate tables within the corresponding Ensemble

60 © Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 31

Hyper normalization +/- Pro   Quick (and inexpensive)

to add new data elements   Independence of tables

enables parallelization (fast & scalable loading)

  Amenable to automation   More joins, slower

queries

Con   Plethora of table

constructs   Table naming becomes

critical and challenging   Model is more difficult to

“read”   Less experience/

resources available (new)

© Tom Breur, 2013 61

Data warehouse automation DWH automation can take on several forms:  Standardized processes, templates, etc.  ETL/DDL generation

 Staging  hub (for 3-tiered DWH architectures)  data marts

 Maintenance  version control  documenting “as built” design

62 © Tom Breur, 2013

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 32

Colors of the Data Vault  3NF Entity contains:

 Business key (blue), relations (red), and descriptive attributes (yellow)

 Star schema:  Business key & descriptive attributes

(Dimension)  Business key, relations & descriptive

attributes (Fact table) 63 © Tom Breur, 2013

Conclusion (1)  Most “hard” (requirements) work goes in

bridging the “semantic gap”:  opportunities abound for (slight) errors and

misunderstanding  When business silos disagree on entity

definition, cracks in the corporate value flow occur ⇒ business misalignment

 Changes in entity numbers across business functions constitute a loss ($)

64

How to Converge on a "Single Version of the Truth"

15-April-2013

www.xlntconsulting.com 33

Conclusion (2)  Committing (too) early how to model data,

makes BI “owner” of the resulting solution  That includes (all of) the data quality “errors”

that result from design decisions  Data traceability (backward lineage)

enables BI to be a responsible steward, without becoming owner of (DQ) “errors”

65

Have your cookies and eat them too!