Upload
trandung
View
215
Download
0
Embed Size (px)
Citation preview
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 1
Data Governance: How to Converge Stakeholders
Throughout the Enterprise Towards a “Single Version of the Truth”
Tom Breur [email protected]
IRM DG & MDM Conference Europe London, 15 April 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 2
Client list
3 © Tom Breur, 2013
4
Agenda 14:00-14:45
Data governance & data integration 14:45-15:30
Data quality & change management 15:30-15:45 tea/coffee break 15:45-16:30
Elusive requirements & Agile BI 16:30-17:15
Hyper normalized DWH Architecture © Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 3
Part 1
Data Governance and the Semantic Gap
© Tom Breur, 2013 5
It’s a stretch… Volumes of data are growing (fast)! Variety of sources keeps expanding:
Social media, RFID, log-files, GPS, etc. Business users need their data (much)
sooner: monthly ⇒ weekly ⇒ daily ⇒ intra-day
BI in support of operational processes, calls for (near) real-time data
6 © Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 4
7
Information gap BI teams (often) find themselves straddled BI imperative: bridge the information gap
© Tom Breur, 2013
Available time to apply information ⇒ shorter
Required time to gather information
⇒ longer
information gap
Incremental data value Insight
(conformed dimensions)
Information (integrated around
Business Keys)
Data (source system facts)
© Tom Breur, 2013 8
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 5
9
Semantic gap BI “carries” information: source ⇒ target Transformations (ETL) change ‘useless’
data (source system facts) into meaningful information (corporate reporting) How to perform these transformations
“correctly” is non-trivial; getting requirements “just right” is near impossible (beforehand!)
Business stakeholders specify interpretation, BI teams execute
© Tom Breur, 2013
10
Why data governance? Management paradox:
encouraging and leveraging the ingenuity of everyone throughout the enterprise,
while ensuring compliance with overall corporate vision and principles,
through accurate and consistent information provision
© Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 6
DG ≠ Information mgt
© Tom Breur, 2013 11
Data
/Info
Man
agem
ent –
man
agin
g da
ta to
ach
ieve
goa
ls
Governance – M
aking sure that
information is m
anaged properly Data
Information, and Content Life Cycles
DG Oversight & Direction
DG Accountability; Ownership, issue
Resolution
DG Responsibility; Stewardship,
Monitoring Source: Data Governance - How to Design, Deploy, and Sustain an Effective Data Governance Program John Ladley, 2012
DG definitions (1) DMBOK:
“Data governance is the exercise of authority, control, and shared decision making (planning, monitoring and enforcement) over the management of data assets”
© Tom Breur, 2013 12
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 7
DG definitions (2) Ladley (2012):
“Data governance is the organization and implementation of policies, procedures, structure, roles, and responsibilities which outline and enforce rules of engagement, decision rights, and accountabilities for the effective management of information assets”
© Tom Breur, 2013 13
DG definitions (3) Gwen Thomas:
“Data governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods”
© Tom Breur, 2013 14
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 8
IT Governance Weill & Ross (2004):
“IT governance is specifying the decision rights and accountability framework to encourage desirable behavior in the use of IT”
Replace “IT” with “data” and this same definition holds for DG as well
© Tom Breur, 2013 15
Why define data governance? Any chosen definition drives the
boundaries for how you can manage your DG program Formally “choosing” your DG definition is a
significant alignment step, during the launching phase of your program
Your accountability/decision structure drives the future of DG, and determines sustainability of your change effort
© Tom Breur, 2013 16
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 9
Accountability framework Who has “a say”
e.g.: should be consulted, and/or allowed to provide input
Who “decides” (& how) e.g.: how are decisions made, what does the decision mechanism look like
© Tom Breur, 2013 17
Origins of DG (1) Data Quality projects/initiatives
DQ issues are the root cause for many (if not most) BI problems
Remediating DQ issues is a prominent driver for DG programs
MDM ensures accurate reference data Many seek a (one) “Golden copy”, usually
originating in some central place (per entity/context)
© Tom Breur, 2013 18
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 10
Origins of DG (2) BI/Data Integration Two types of issues (grossly) or
challenges: DQ issues caused by poor data entry, or
incorrect data capture (“DQ errors”) DQ issues caused by imperfect specification
(“requirements”) of semantic representation (improper transformation, ETL)
© Tom Breur, 2013 19
⇒ This is where the ‘struggle’ to create a “Single version of the truth” takes place!
Two types of DQ challenges Organizational:
(Slightly) different definitions for seemingly similar business entities (e.g.: “customer”)
Misalignment of performance objectives Conforming dimensions is –essentially– a
political process Technical:
Traditional DWH architectures do not support DQ resolution and governance very well
20
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 11
“Organizational” challenges (1) (Slightly) different definitions for seemingly
similar business entities (e.g.: “customer”) MDM programs are never fully completed,
and rarely seem to come to fruition Common terminology and definitions across
the company are (exceedingly) rare Finding integration points (Business Keys) is
exceedingly hard BI teams do not “own” the semantic gap
between source systems & data interpretation 21
“Organizational” challenges (2) Misalignment of performance objectives Stakeholders across the business, receive
irreconcilable targets Distinct business processes pursue “locally
optimal” targets (and use corresponding definitions)
Hence: their reporting needs, and ‘outlook’ on (interpretation!) corporate facts will differ
22
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 12
“Organizational” challenges (3) Conforming dimensions is –essentially– a
political process Conforming dimensions requires (exactly)
matching entity definitions Without globally agreed upon dimensions, no
“single version of the truth” is ever possible Dimension definition impacts (“semantic”)
interpretation of source system facts BI teams rather than “wait & receive”, can
“drive & inform” to close these gaps 23
“Technical” challenges (1) Traditional DWH architectures do not support
DQ resolution and governance very well Data warehousing needs to bridge the
“semantic gap” between source system data, and corporate reporting ‘Inverse reporting’ of “semantic gap” (Big T)
requires omnipresent traceability and (immediate) backward lineage
Bridging the “semantic gap” requires insight in (all!) incumbent business rules
24
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 13
“Technical” challenges (2) Traditional DWH architectures do not ‘work’ very
well with Agile methodologies Big Data is increasing the “pressure to
deliver”: a range from OLAP/Dimensional to Self-Service functionality is required Teams cannot do “everything” for “everybody” (early releases of) Data need to be made
available sooner Traditional (Inmon & Kimball) modeling
approaches do not support agility! 25
“Technical” challenges (3) Data warehousing needs to embrace generation
of code & data virtualization (required) Lower data latency is increasing
the “pressure to deliver” DWH Automation can enable faster delivery (early releases of) Data need to be made
available sooner, not always necessarily in dimensional format (yet)
26
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 14
“Technical” challenges (4) Traditional DWH architectures do not support
DQ resolution and governance very well Inflexible (20th century) data modeling
paradigms dictate (too) early commitments Agile recommends “delaying decisions to the
last responsible moment” Only when data (in ‘some’ form) are available
can conforming dimensions become an informed and democratic process
27
DG & change (1) Ladley (2012):
“Data governance is … a long-term commitment to doing business differently”
Data governance is about: changing business processes, by changing people’s behavior, through the only leverage point we can truly influence: “me”
© Tom Breur, 2013 28
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 15
Change framework (1) Choice of change tactics should be driven
by prevailing conditions (quadrant I-IV)
© Tom Breur, 2013 29
Source: Data Quality Happyland – How to Get There Tom Breur, 2009
Change framework (2) Unaware ⇒ aware
Inform the business about costs of non-quality, missed opportunities ($), etc.
Incompetent ⇒ competent Train staff, and/or provide better tools and
infrastructure Aware ⇒ unaware
Change accountabilities, to “build quality into the organization” (make it default)
© Tom Breur, 2013 30
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 16
Concluding remarks Part 1 “Big Data” are here to stay
(and lets hope the hype passes soon) We face an information & semantic gap Data Governance implies separating
data/information management from DG DG is a business alignment effort DG implies “change”; to make it ‘stick’, you
need to inform, empower & redefine accountabilities
31 © Tom Breur, 2013
Coffee/tea break 15:30-15:45
© Tom Breur, 2013 32
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 17
Part 2 An Architecture in
Support of Data Governance
© Tom Breur, 2013 33
BI Requirements (1)
© Tom Breur, 2013 34
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 18
BI: means and ends uncertainty Means uncertainty How do we get there? Lack of “design patterns” Data integration fraught
with data quality issues Lack of Master Data
Management Lack of Meta Data No agreement on how to
conform dimensions
Ends uncertainty Where are we going to? Requirements are difficult
to pin down Diverse end-user groups Ambiguous business
case(s) Scope is unclear Data warehouses are
never “done”
© Tom Breur, 2013 35
Source: Agile & BI – a marriage made in heaven? Tom Breur, 2011
BI Requirements (2)
© Tom Breur, 2013 36
Requirements are hard! Let’s go shopping
Two possible alternatives for structured requirements gathering:
BEAM★ -or- ADAPT
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 19
Why go “Agile”? (1) BI projects fail too often, or don’t live up to
expectations Increasingly, BI development takes place
alongside (instead of after) application engineering
37 © Tom Breur, 2013
Why go “Agile”? (2) Winston Royce (1970):
38 © Tom Breur, 2013
Release Test
Development Design
Analysis
“In my experience, the simpler model … [as pictured below] has never worked on large
software development efforts”
[Royce subsequently went on to describe an enhanced model, which included building a prototype first and then using the prototype plus feedback
between phases to build a final deployment]
Source: Managing the Development of Large Software Systems Winston Royce, 1970
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 20
Waterfall ⇔ Agile
© Tom Breur, 2013 39
Waterfall/Traditional Agile
Plan Driven
Value Driven
Requirements Resources Date
Resources Date Requirements
Agile fixes the date and resources and varies the scope
Fixed
Estimated
Source: Agile Software Requirements Dean Leffingwell, 2011
Quick & Dirty ≠ Agile (1) www.agilemanifesto.org (principle #1):
Creating “technical debt” stands squarely in the way of continuous delivery, and maintaining a so-called “sustainable pace”: it creates (new) legacy!
“Our highest priority is to satisfy the customer through early and continuous
delivery of valuable software” [emphasis added TB]
40 © Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 21
Quick & Dirty ≠ Agile (2)
41 © Tom Breur, 2013
BI requirements (3) Information products (and changes)
trigger new/more change requests:
new data ⇒ insights ⇒ new requirements
Gerald M. (Jerry) Weinberg:
“Without stable requirements, development can’t stabilize, either”
42 © Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 22
Slide 43
Inmon ⇔ Kimball (1) 3-tiered 2-tiered
© Tom Breur, 2013
Inmon ⇔ Kimball (2) Problems with Inmon Uncovering the ‘correct’
3NF model requires scarce business expertise
Unclear where 3NF model boundaries begin and end
Model redesigns trigger a cascading nightmare of parent-child key updates
Problems with Kimball Smallest unit of delivery
is a Star Incremental growth adds
prohibitive overhead
Dimensional structure is very rigid → not conducive to expansion or change
Conforming dimensions is hard, in particular without access to data
© Tom Breur, 2013 44
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 23
3NF ⇔ Dimensional (1)
© Tom Breur, 2013 45
Source: Adapting Data Warehouse Architecture to Benefit from Agile Methodologies Badari Boyine & Tom Breur, 2013
3NF ⇔ Dimensional (2)
© Tom Breur, 2013 46
see: Kimball design tip # 149 http://www.kimballgroup.com/2012/10/02/design-tip-149-facing-the-re-keying-crisis/
this problem gets (much!) worse with multiple parent-child levels
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 24
3NF ⇔ Dimensional (3)
© Tom Breur, 2013 47
Adding new dimension, and changing granularity,
requires doing the initial load again (because of the way surrogate keys are assigned)
Hyper normalized model
© Tom Breur, 2013 48
business keys, context attributes (history), and relations, all have their own tables
appending “Supplier data” to the model (or any other new source), is guaranteed to be contained as a “local” problem (=extension) in the data model because business keys, context attributes (history), and relations all have their own tables
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 25
Source: Business Intelligence Architecture in Support of Data Quality Tom Breur, 2013 3-tiered BI architecture
Legacy
OLTP
ERP
LOG files
External
ETL Staging
Area
Data Warehouse
ODS
Datamart 1
Datamart 2
Datamart n
Business Intelligence Applications
Metadata
dimensional 49 © Tom Breur, 2013 3NF hyper
normalized
BI value stream
structural transformation
semantic interpretation
meaningful (& accurate) reporting
© Tom Breur, 2013 50
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 26
Business rules downstream Data in central hub (should) represent
“true” state of affairs in source system(s) Load all data, all the time:
the good, the bad, and the ugly Data in corporate reports are the “sum” of
transformation & interpretation ⇒ avoid confounding of “errors”
Auditability of data is rarely an explicit requirement, but always implicitly (trust!)
© Tom Breur, 2013 51
Horses for courses (1) 3NF
quickly & accurately capture transaction data easy to get data in
Hyper normalized integrate historical data capture all data, all the time
Dimensional present & analyze data easy to get data out
© Tom Breur, 2013 52
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 27
Horses for courses (2)
• consistency • rigid model
3NF • validity • flexibility
Hyper normalized
• meaning • ease of use
Dimensional
© Tom Breur, 2013 53
Legacy
OLTP
ERP
LOG files
External
ETL Staging
Area
Data Warehouse
ODS
Datamart 1
Datamart 2
Datamart n
Business Intelligence Applications
Metadata Back room Data Warehouse Architecture
Front room Business Intelligence Architecture
Backroom ⇔ Frontroom
54 © Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 28
Backroom ⇔ Frontroom Lean Manufacturing “Push” metaphor Focus on control &
stability Standardization → Change management Persistent storage Hyper normalized Restricted access Change is an exception
Lean Product Development “Pull” metaphor Focus on responsiveness
& agility Flexibility → Self-service Volatile/Virtual Dimensional Free-for-all Change is the norm
© Tom Breur, 2013 55
Why data virtualization? Operational BI calls for (near) real-time data Data virtualization enables federation ⇒ you can delay (definitive) modeling, yet make data available (very) early on Adjusted deployments can be released almost
immediately Piecemeal adjusting (“tweaking”) of
misaligned Dimensions enables informed discussions ⇒ consequences (target setting)
56 © Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 29
Divide & Conquer (1) “Break down” semantic gap from
Backroom to Frontroom Offer a range of data (self-) services:
Source data “as is” Source data that have undergone cleansing Dimensional models Full-fledge BI applications
Allow business to set (development) priorities!
57 © Tom Breur, 2013
Divide & Conquer (2) Transform (multiple) disparate (technical)
source keys to meaningful Business
Keys (aka “Business Vault”)
Quickly model data (based on PK & FK) in
hyper normalized structures for exploration
and discovery (aka “Source Vault”)
© Tom Breur, 2013 58
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 30
Decomposition & synthesis Source elements are first decomposed,
then “reassembled” (usually dimensional)
Integration around business keys (orange), ⇒ identifiers for end-users
59 © Tom Breur, 2013
Source: Modeling the Agile Data Warehouse with Data Vault Hans Hultgren, 2012
Entity ⇒ Ensemble 3NF Entity contains:
Business Key Relations Descriptive attributes (=context & history)
All broken out in separate tables within the corresponding Ensemble
60 © Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 31
Hyper normalization +/- Pro Quick (and inexpensive)
to add new data elements Independence of tables
enables parallelization (fast & scalable loading)
Amenable to automation More joins, slower
queries
Con Plethora of table
constructs Table naming becomes
critical and challenging Model is more difficult to
“read” Less experience/
resources available (new)
© Tom Breur, 2013 61
Data warehouse automation DWH automation can take on several forms: Standardized processes, templates, etc. ETL/DDL generation
Staging hub (for 3-tiered DWH architectures) data marts
Maintenance version control documenting “as built” design
62 © Tom Breur, 2013
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 32
Colors of the Data Vault 3NF Entity contains:
Business key (blue), relations (red), and descriptive attributes (yellow)
Star schema: Business key & descriptive attributes
(Dimension) Business key, relations & descriptive
attributes (Fact table) 63 © Tom Breur, 2013
Conclusion (1) Most “hard” (requirements) work goes in
bridging the “semantic gap”: opportunities abound for (slight) errors and
misunderstanding When business silos disagree on entity
definition, cracks in the corporate value flow occur ⇒ business misalignment
Changes in entity numbers across business functions constitute a loss ($)
64
How to Converge on a "Single Version of the Truth"
15-April-2013
www.xlntconsulting.com 33
Conclusion (2) Committing (too) early how to model data,
makes BI “owner” of the resulting solution That includes (all of) the data quality “errors”
that result from design decisions Data traceability (backward lineage)
enables BI to be a responsible steward, without becoming owner of (DQ) “errors”
65
Have your cookies and eat them too!