90
Agile Data Warehouse Design with Big Data John DiPietro & Jim Stagnitto 1

Agile Data Warehouse Design for Big Data Presentation

Embed Size (px)

DESCRIPTION

Synopsis: [Video link: http://www.youtube.com/watch?v=ZNrTxSU5IQ0 ] Jim Stagnitto and John DiPietro of consulting firm a2c) will discuss Agile Data Warehouse Design - a step-by-step method for data warehousing / business intelligence (DW/BI) professionals to better collect and translate business intelligence requirements into successful dimensional data warehouse designs. The method utilizes BEAM✲ (Business Event Analysis and Modeling) - an agile approach to dimensional data modeling that can be used throughout analysis and design to improve productivity and communication between DW designers and BI stakeholders. BEAM✲ builds upon the body of mature "best practice" dimensional DW design techniques, and collects "just enough" non-technical business process information from BI stakeholders to allow the modeler to slot their business needs directly and simply into proven DW design patterns. BEAM✲ encourages DW/BI designers to move away from the keyboard and their entity relationship modeling tools and begin "white board" modeling interactively with BI stakeholders. With the right guidance, BI stakeholders can and should model their own BI data requirements, so that they can fully understand and govern what they will be able to report on and analyze. The BEAM✲ method is fully described in Agile Data Warehouse Design - a text co-written by Lawrence Corr and Jim Stagnitto. About the speaker: Jim Stagnitto Director of a2c Data Services Practice Data Warehouse Architect: specializing in powerful designs that extract the maximum business benefit from Intelligence and Insight investments. Master Data Management (MDM) and Customer Data Integration (CDI) strategist and architect. Data Warehousing, Data Quality, and Data Integration thought-leader: co-author with Lawrence Corr of "Agile Data Warehouse Design", guest author of Ralph Kimball’s “Data Warehouse Designer” column, and contributing author to Ralph and Joe Caserta's latest book: “The DW ETL Toolkit”. John DiPietro Chief Technology Officer at A2C IT Consulting John DiPietro is the Chief Technology Officer for a2c. Mr. DiPietro is responsible for setting the vision, strategy, delivery, and methodologies for a2c’s Solution Practice Offerings for all national accounts. The a2c CTO brings with him an expansive depth and breadth of specialized skills in his field. Sponsor Note: Thanks to: Microsoft NERD for providing awesome venue for the event. http://A2C.com IT Consulting for providing the food/drinks. http://Cognizeus.com for providing book to give away as raffle.

Citation preview

Page 1: Agile Data Warehouse Design for Big Data Presentation

Agile Data Warehouse Design with Big Data

John DiPietro & Jim Stagnitto!1

Page 2: Agile Data Warehouse Design for Big Data Presentation

Agenda• Introduction / a2c Overview

• Modeling for End Users

• Role of Dimensional Models in Big Data

• Example: eCommerce • Structured Data: Sales

• Semi-structured Data: Clickstream

• Agile Dimensional Modeling Overview

• Case Study Review

• Q&A

!2

Page 3: Agile Data Warehouse Design for Big Data Presentation

Introduction• a2c

• Boutique EDM (Enterprise Data Management) consultancy firm:

• Data Warehousing • Master Data Management • Closed Look Analytics and Visualization • Data & Application Architecture

• John DiPietro • Principal, Chief Technology Officer

• Jim Stagnitto • Data Warehouse & MDM Architect

!3

Page 4: Agile Data Warehouse Design for Big Data Presentation

a2c Corporate Overview & Industry Experience

!4

Page 5: Agile Data Warehouse Design for Big Data Presentation

Company Overview• Technology Solution Consultancy headquartered in Philadelphia with

regional offices in New York and Boston

• Servicing Healthcare, Life Science, Tel-Com and Financial Services industries with recent obtainment of our GSA schedule to pursue Federal Government opportunities

• Consultant base of over 2500 proven IT professionals throughout the North East Region with a recruiting network which provides national coverage

• Flexible approach to helping our clients with their initiatives • Project-based Solutions

• Staff Augmentation

• Managed Service Offerings – “On-Shore QA , Development & Application Support”

• Executive & Professional Search

!5

Page 6: Agile Data Warehouse Design for Big Data Presentation

Competitive Advantage• Founders of a2c were part of the fastest growing privately held IT consulting and staff

augmentation firm in the US from 1994-2002. Our Executive Management Team has over a 100 years collective experience and been responsible for delivering over a half-billion dollars of IT Consulting and staff augmentation revenue from 1994 through to the present day.

• a2c’s Recruiting Engine and Methodology is one of the best in the industry, capable of producing quality results, on-demand for our clients

• Resource Managers continually “Silo” disciplines with available candidates whom have proven their abilities with us over the last 10 years

• Our solutions organization is instrumentally involved during the screening and selection process to ensure that candidates submitted to our clients are an ideal match

• a2c’s Culture provides an ability to attract and retain the best talent in the industry and fosters creativity, integrity, growth and teamwork

• a2c provides our clients with an alternative solution to a “Big 4” consultancy at substantial savings for projects that are between $500K and $5M due to our flexibility, agility and focus

!6

Page 7: Agile Data Warehouse Design for Big Data Presentation

Representative Clients

03/19/12!7

Page 8: Agile Data Warehouse Design for Big Data Presentation

a2c Solution Engagement Structures

• Technology Strategy & Roadmap Formulation

• Needs & Readiness Assessment

• Package & Platform Selections

• Proof of Concept Implementation

• Requirements Discovery & Specifications

• Program/Project Management

• Full Life Cycle & Application Development

• Infrastructure & Facilities Initiatives

• Managed Services & Maintenance Support

!8

Page 9: Agile Data Warehouse Design for Big Data Presentation

a2c Solutions Capabilities• Enterprise Data Management Practice helps clients manage their complete Information

Lifecycle from their On-line Transactional systems to their Data Warehousing, Enterprise Reporting, Data Migration, Back-Up and Recovery Strategies (See Slide 7)

• Business Architecture & Optimization Practice utilizes “Six Sigma Lean” methodologies to analyze, re-engineer and automate our client’s business processes to leverage human workflow and business rules engine technologies to create efficiencies and provide business unit owners with the necessary metrics to continually improve performance

• Program Management Office oversees all aspects of solutions planning and delivery across client engagement teams and provides the methodology and frameworks which are based on PMI® industry standards

• Application Development & Managed Services Practice helps clients architect, implement and deploy the latest Microsoft and Enterprise Java based applications which are built on proven frameworks and architectures for the enterprise

• a2c's SDLC Delivery Model is comprised of over 20 years collective best practices and industry proven methodologies that allow our delivery teams to rapidly design, develop and implement solutions. Our SDLC model has been designed to complement our project management methodology, utilizing iterative development cycles that enable project teams to provide consistently high quality, on-time deliverables, regardless of technology platform

!9

Page 10: Agile Data Warehouse Design for Big Data Presentation

Agile DW Design Overview

!10

Page 11: Agile Data Warehouse Design for Big Data Presentation

Modeling for End Users• How to Design to Answer

Business Questions? • Think about how questions are articulated

• And how the answers should be deliveredIdentify a common question framework

• Design an architecture that embraces and leverages this common question framework

• Utilize the best designs and technologies to: • (a) derive the answers

• (b) present them in compelling ways that lead to the next interesting question!

!11

Page 12: Agile Data Warehouse Design for Big Data Presentation

What

How Do We Ask Questions?

“How do this quarter’s sales by sales rep of electronic products that we promoted to retail customers in the east compare with last year’s?

What

Who

Who

When

WhenWhere Why

!12

Page 13: Agile Data Warehouse Design for Big Data Presentation

How Do We Ask Questions?• Events / Transactions

• e.g. Sale

• a immutable "fact" that occurs in a time and (typically a) place

• Interrogatives: • Who, What, When, Where, Why

• Descriptive context that fully describes the event

• a set of “dimensions" that describe events

!13

Page 14: Agile Data Warehouse Design for Big Data Presentation

Dimensional Value Proposition

• It makes sense to present answers to people using the same taxonomy of events and interrogatives (aka: facts and dimensions - dimensional structure) that they use when forming questions

• Events are instances of processes :

• It’s best to present information to people who will ask the system questions in dimensional form

• This is true regardless of the type of information being interrogated, it’s source, or IT stuff (like database technologies utilized)

• It’s best to model this presentation layer based on the events (aka: business processes) that underlie the questions

!14

Page 15: Agile Data Warehouse Design for Big Data Presentation

How Many

Why

WhereH

ow

WhoWhen

What

!15

Page 16: Agile Data Warehouse Design for Big Data Presentation

Scenarios

• A brief discussion of how and where dimensional modeling and/or databases fit within common and emerging “big data” data warehousing architectures

!16

Page 17: Agile Data Warehouse Design for Big Data Presentation

Kimball Dimensional DW

Dimensional BI Semantic Layer

Dimensional Data Warehouse

Data Movement / Integration

Source Data (Structured)

!17

Page 18: Agile Data Warehouse Design for Big Data Presentation

Kimball with Big DataDimensional BI Semantic Layer

Dimensional Data Warehouse

Data Movement / Integration Tier

Source Data Tier (Un/Semi-Structured)

Big Data Capture (e.g. HDFS)

Big Data Discovery

(e.g. MR)

Data Movement / Integration Tier

Source Data Tier (Structured)

!18

Page 19: Agile Data Warehouse Design for Big Data Presentation

Corporate Information Factory (CIF)

Dimensional BI Semantic Layer

Dimensional Tier (Virtual or Physical)

Data Movement / Integration

Source Data (Structured)

Corporate Information Factory 3NF DW

!19

Page 20: Agile Data Warehouse Design for Big Data Presentation

CIF with Big DataDimensional BI Semantic Layer

Dimensional Tier (Virtual or Physical)

Data Movement / Integration Tier

Source Data Tier (Un/Semi-Structured)

Big Data Capture (e.g. HDFS)

Big Data Discovery

(e.g. MR)

Data Movement / Integration Tier

Source Data Tier (Structured)

Corporate Information Factory 3NF DW

!20

Page 21: Agile Data Warehouse Design for Big Data Presentation

Data VaultDimensional BI Semantic Layer

Dimensional Tier (Virtual or Physical)

Data Movement / Integration

Source Data (Structured)

Data Vault

!21

Page 22: Agile Data Warehouse Design for Big Data Presentation

Data Vault with Big DataDimensional BI Semantic Layer

Dimensional Tier (Virtual or Physical)

Data Movement / Integration Tier

Source Data Tier (Un/Semi-Structured)

Big Data Capture (e.g. HDFS)

Big Data Discovery

(e.g. MR)

Data Movement / Integration Tier

Source Data Tier (Structured)

Data Vault

!22

Page 23: Agile Data Warehouse Design for Big Data Presentation

Etc.

!23

Page 24: Agile Data Warehouse Design for Big Data Presentation

Common Framework

Dimensional BI Semantic Layer

Dimensional Tier [Physical (Kimball) or Virtual (CIF or Data Vault)

Un/Semi-Structured Data Movement

Un/Semi-Structured Source Data

Persistant Un/Semi-Structured

Staging Area

Unstructured -> Structured

Data Discovery Processing

Structured Data Movement

Structured Source Data (Structured)

Persistent Structured Data Repository

(not needed for Kimball)

!24

Insight Generation / Data Mining

Page 25: Agile Data Warehouse Design for Big Data Presentation

Kitchen Off Limits to End Users

Data Professionals Only Please Dangerous / Inhospitable Environment

Data Assets “Not Ready for Primetime” Structured Variably For Data Processing

Dining Room Readily Accessible to End Users

(and BI Developers) Safe, Hospital Environment

Data Assets “Ready for Primetime” Dimensionally Structured

Common Framework

Dimensional BI Semantic Layer

Dimensional Tier [Physical (Kimball) or Virtual (CIF or Data Vault)

Un/Semi-Structured Data Movement

Un/Semi-Structured Source Data

Persistant Un/Semi-Structured

Staging Area

Unstructured -> Structured Data

Discovery Processing

Structured Data Movement

Structured Source Data (Structured)

Persistent Structured Data Repository

(not needed for Kimball)

eCommerce ExampleClickstream Data eCommerce Sale

!25

Page 26: Agile Data Warehouse Design for Big Data Presentation

eCommerce Example: ClickstreamRaw Clickstream Data!25 52 164 240 274 328 368 448 538 561 630 687 730 775 825 834 39 120 124 205 401 581 704 814 825 834 35 249 674 712 733 759 854 950 39 422 449 704 825 857 895 937 954 964 15 229 262 283 294 352 381 708 738 766 853 883 966 978 26 104 143 320 569 620 798 7 185 214 350 529 658 682 782 809 849 883 947 970 979 227 390 71 192 208 272 279 280 300 333 496 529 530 597 618 674 675 720 855 914 932 183 193 217 256 276 277 374 474 483 496 512 529 626 653 706 878 939 161 175 177 424 490 571 597 623 766 795 853 910 960 125 130 327 698 699 839 392 461 569 801 862 27 78 104 177 733 775 781 845 900 921 938 101 147 229 350 411 461 572 579 657 675 778 803 842 903 71 208 217 266 279 290 458 478 523 614 766 853 888 944 969 43 70 176 204 227 334 369 480 513 703 708 835 874 895 25 52 278 730 151 432 504 830 890 71 73 118 274 310 327 388 419 449 469 484 706 722 795 810 844 846 918 130 274 432 528 967 188 307 326 381 403 523 526 722 774 788 789 834 950 975 89 116 198 201 333 395 653 720 846 70 171 227 289 462 538 541 623 674 701 805 946 964 143 192 317 471 487 631 638 640 678 735 780 865 888 935 17 242 471 758 763 837 956 52 145 161 283 375 385 676 721 731 790 792 885 182 229 276 529 43 522 565 617 859

Semi-Structured

Recording of every page request made by a user

Includes some structural elements – such as when the request was made and who the user is

Requires significant prep work in order to fit into a traditional row-based relational database

Apples and Oranges: Pre-Sessionized Page Visits, Detailed Product Views, Catalogue Requests, Shopping Cart Adds / Deletes / Abandons, etc.

Needs to be converted into seperate-but-relatable dimensional facts - with many shared (conformed) dimensions

!26

Page 27: Agile Data Warehouse Design for Big Data Presentation

Typical Clickstream “Page View” Dimensional Model

What

Why Who

When

What

!27

Page 28: Agile Data Warehouse Design for Big Data Presentation

eCommerce Example: Web Sales

• Fully Structured

• The Sale Transaction typically carries all fundamental dimensions: • Time

• Customer

• Referring URL / Search Phrase

• Product

• Purchase and/or Shipment (Geo or URL) Locations

• Promotion / Campaign

• Etc.

• And “How Many” Measures • Unit and Price Quantities /

Amounts

• Discount Amounts

• Etc

!28

Page 29: Agile Data Warehouse Design for Big Data Presentation

eCommerce DimensionalityFacts (below) &

Dimensions (right)Time!

(When)Customer!

(Who)Web Page!(Where)

Product!(What)

Referring URL!

(Where)

Promotion /

Campaign (Why)

Activity Type (How)

Page VisitView Start View End Session

Start Session End

Visitor CurrentPrevious

Next✔

Detailed Product View

View Start View End Session

Start Session End

ProspectCurrentPrevious

Next✔ ✔

Shopping Cart Activity

Activity Start Activity End Prospect ✔ ✔ ✔ ✔

Sale (Checkout) Sale Start Sale End Customer ✔ ✔ ✔ ✔

Shipment / Delivery Shipment Delivery

Customer Delivery

Recipient✔

!29

Page 30: Agile Data Warehouse Design for Big Data Presentation

Agile DW Design Overview

!30

Page 31: Agile Data Warehouse Design for Big Data Presentation

The first dimensional modeler:

R.K.Ralph Kimball?Rudyard Kipling

!31

Page 32: Agile Data Warehouse Design for Big Data Presentation

–Rudyard Kipling

I keep six honest serving-men (They taught me all I knew);

Their names are What and Why and When And How and Where and Who…

!32

!32

Page 33: Agile Data Warehouse Design for Big Data Presentation

Who!33

Page 34: Agile Data Warehouse Design for Big Data Presentation

What!34

Page 35: Agile Data Warehouse Design for Big Data Presentation

When!35

Page 36: Agile Data Warehouse Design for Big Data Presentation

Where!36

Page 37: Agile Data Warehouse Design for Big Data Presentation

Why!37

Page 38: Agile Data Warehouse Design for Big Data Presentation

How!38

Page 39: Agile Data Warehouse Design for Big Data Presentation

How Many!39

Page 40: Agile Data Warehouse Design for Big Data Presentation

The 7WsFramework

Page 41: Agile Data Warehouse Design for Big Data Presentation

How Many

Why

WhereH

ow

WhoWhen

What

Page 42: Agile Data Warehouse Design for Big Data Presentation

How did we get here?

Page 43: Agile Data Warehouse Design for Big Data Presentation

Corporate Information Factory

!Data-Driven Analysis

Undisciplined Dimensional !

Report-Driven Analysis

Dimensional Bus Architecture

!Process-Driven Analysis

DW Architectures: A Brief History

Page 44: Agile Data Warehouse Design for Big Data Presentation

7Ws Dimensional Model

How – Facts: Much Many Often

£ $ €Where

Location Geographic

Store Ship To Hospital

Who Customer Employee Third Party

Organization

What Product Service

Transactions

When Time Day

Month Fiscal Period

Why Causal

Promotion Reason Weather

Competition

??

Page 45: Agile Data Warehouse Design for Big Data Presentation

Where

WhoWhen

What

How

Why

How ManyBEAM

Business Event Analysis & Modeling

Page 46: Agile Data Warehouse Design for Big Data Presentation

Howdo you design a data warehouse?

Page 47: Agile Data Warehouse Design for Big Data Presentation

Tech Design Artifacts?CALENDARDate Key

DateDayDay in WeekDay in MonthDay in QtrDay in YearMonthQtrYearWeekday FlagHoliday Flag

PRODUCT

Product Key

Product CodeProduct DescriptionProduct Type Brand Subcategory Category

PROMOTIONPromotion Key

Promotion CodePromotion NamePromotion TypeDiscount TypeAd Type

SALES FACT

Quantity Sold Revenue

CostBasket Count

Date KeyProduct KeyStore KeyPromotion Key

STORE

Store Key

Store CodeStore NameURLStore ManagerRegion Country

Page 48: Agile Data Warehouse Design for Big Data Presentation

OK, Now Validate with

Page 49: Agile Data Warehouse Design for Big Data Presentation

WhyAgile Data Warehousing?

Page 50: Agile Data Warehouse Design for Big Data Presentation

Waterfall BI/DW

Analysis

Design

Development

Test

Release

Limited Stakeholder interaction

DATAVALUE?Data

ModelStakeholder

InputETL BIRequirements

BDUFNext YearThis Year

Page 51: Agile Data Warehouse Design for Big Data Presentation

Agile DW/BI Development

Iteration nIteration …Iteration 3Iteration 1

VALUE!VALUE? VALUE VALUE! VALUE!

Iteration 2

Stakeholder interaction

Next YearThis Year

Review Release

BI PrototypingETL?

RevBIETL ADM

JEDUF

DATA

Page 52: Agile Data Warehouse Design for Big Data Presentation

State of The DW Field

Solid:

Dimensional Data Warehouse Design is Mature

Proven Design Patterns Exist for Common Requirements

Hit or Miss:

Collecting Unambiguous and Thorough Requirements

Slotting Requirements into Proven Design Patterns

End-User Ownership and Validation

Too Often: Snatching Defeat from the Jaws of Victory

!52

Page 53: Agile Data Warehouse Design for Big Data Presentation

Quick

Modelstorming

Data Modeler BI Stakeholders

Inclusive

Interactive Fun

Page 54: Agile Data Warehouse Design for Big Data Presentation

Structured, non-technical, collaborative working conversation directly with BI Users

• BI User’s Business Process, Organizational, Hierarchical, and Data Knowledge

• Focused Data Profiling

• Logical and Physical (Kimball-esque) Dimensional Data Models • Example data • Detailed and Testable ETL

Specification • Instantiated DW

Prototype

BEAM✲

BEAM✲ Methodology

Data Modeler BI Stakeholders

Page 55: Agile Data Warehouse Design for Big Data Presentation

Requirements = Design

���55

Page 56: Agile Data Warehouse Design for Big Data Presentation

Collaboration at Every Step

Page 57: Agile Data Warehouse Design for Big Data Presentation

Agile Data Modeling Requirements

• Techniques for encouraging interaction

• Must use simple, inclusive notation and tools

• Must be quick: hours rather than days – modelstorming

• Balance ‘just in time’ (JIT) and ‘just enough design up front’ (JEDUF) to reduce design rework

• DW designers must embrace data model change, allow models to evolve, avoid generic data models; need design patterns they can trust to represent tomorrow’s BI requirements tomorrow

• ETL and BI developers must embrace database change; need tool support

!57

Page 58: Agile Data Warehouse Design for Big Data Presentation

Whatkind of model?

Page 59: Agile Data Warehouse Design for Big Data Presentation
Page 60: Agile Data Warehouse Design for Big Data Presentation

CALENDAR

Date Key

Date Day Day in Week Day in Month Day in Qtr Day in Year Month Qtr Year Weekday Flag Holiday Flag

PRODUCT

Product Key

Product Code Product Description Product Type Brand Subcategory Category

PROMOTION

Promotion Key

Promotion Code Promotion Name Promotion Type Discount Type Ad Type

SALES FACT

Quantity Sold Revenue

Cost Basket Count

Date Key Product Key Store Key Promotion Key

STORE

Store Key

Store Code Store Name URL Store Manager Region Country

Page 61: Agile Data Warehouse Design for Big Data Presentation

Customer

Country

Customer Type

Product Type

Category

Product

Month

Calendar

Holiday Type

Store Type

Store

City

Sales Fact

Page 62: Agile Data Warehouse Design for Big Data Presentation

Modeling by Abstraction

Page 63: Agile Data Warehouse Design for Big Data Presentation

Modeling by Example

Page 64: Agile Data Warehouse Design for Big Data Presentation

Agile DW Design Process

���64

Page 65: Agile Data Warehouse Design for Big Data Presentation

Who does what?

Subjects Verb Objects

“Customers buy products”

BEAM✲ Modeler BI Users

Collaborative / Conversational Design

Page 66: Agile Data Warehouse Design for Big Data Presentation

Design Using Natural Language

• Verbs – Events – Relationships – Fact Tables

• Nouns – Details – Entities – Dimensions

• Main Clause – Subject-Verb-Object

• Prepositions – connect additional details to the main clause

• Interrogatives – The 7Ws – Dimension Types

• Business Vocabulary - no IT-Speak

!66

Page 67: Agile Data Warehouse Design for Big Data Presentation

“Spreadsheet”-like Models

Details

Example Data (4-6 rows)

Subject Column Name

Object Column Name

Verb

Interrogative

Event Table Name (filled in later)

Page 68: Agile Data Warehouse Design for Big Data Presentation

Straightforward MethodologyWho

What

When

Where

How (many)

Why

How

11111

1

11111

3

11111

4

11111

5

11111

2

11111

6

11111

7

11111

8

11111

9

Declare Event Type

Subject-Verb-Object

Quantities - Facts

Sufficient Detail Fact Granularity

Initial Data Examples

Page 69: Agile Data Warehouse Design for Big Data Presentation

Capture Example Data

Engage business users

Clarify definitions / Conform Dimensions

Illustrate exceptions

Drive out uniqueness

“Show and tell”

verb on/at/every

SUBJECT OBJECT EVENT DATE

[who] [what] [when] [where] [how many] [why] [how]

Typical Typical/Popular Typical Typical Typical/Average Typical/Normal Typical/Normal

Different Different Different Different Different Different Different

Repeat Repeat Repeat Repeat Repeat Repeat Repeat

Missing Missing Missing Missing Missing Missing Missing

Group Multiple/Bundle Multi-Level Multiple Values

Old, Low Old, Low Value Oldest needed Near Min, Negative, 0

New, High New, High Most Recent, Future Far Max, Precision Exceptional Exceptional

Page 70: Agile Data Warehouse Design for Big Data Presentation

Thoughtful Example Data

Detailed ETL Specification

Page 71: Agile Data Warehouse Design for Big Data Presentation

Identify Event Type Early

Page 72: Agile Data Warehouse Design for Big Data Presentation

Adjust Conversation Based on Event Type

• Discrete Event -> Transaction • Instantaneous/short duration, irregularly occurring events or

transactions

• Recurring Event -> Periodic Snapshot – measurement • Regularly occurring events, ongoing processes, typically use to

measure cumulative of discrete events

• Evolving Event -> Accumulating Snapshot – timeline • Non-instantaneous/longer duration, irregularly occurring events or

transactions

• Represents current status - reflects adjustments

!72

Page 73: Agile Data Warehouse Design for Big Data Presentation

Capture When Details

When do Customers order Products?

BEAM✲ Modeler

BI Users

“On the Order Date”

Page 74: Agile Data Warehouse Design for Big Data Presentation

Any other Whens?

Page 75: Agile Data Warehouse Design for Big Data Presentation

Any other Whos?

Page 76: Agile Data Warehouse Design for Big Data Presentation

And so on...

Page 77: Agile Data Warehouse Design for Big Data Presentation

Model How Many Measures• Additive – can be summed up over any combination

of dimensions. No special rules

• Non-additive – can not be summed over any dimension e.g. unit price or temperature • Must be aggregated in other ways e.g. average, min, max

• Degenerate Dimensions – transaction #, timestamps, flags

• Semi-additive – can not be summed across at least one dimension e.g. balances can not be summed over time

!77

Page 78: Agile Data Warehouse Design for Big Data Presentation

Modeling Dimensions

Page 79: Agile Data Warehouse Design for Big Data Presentation

Annotate w Targeted Data Profiling

Page 80: Agile Data Warehouse Design for Big Data Presentation

Proceed Through the Business Process Value Chain

Page 81: Agile Data Warehouse Design for Big Data Presentation

Collaborative Dimension Conformance

Dimensions

Time Shipper Customer Plant Response Product Promotion

Sales

Campaigns

Page 82: Agile Data Warehouse Design for Big Data Presentation

Identify Hierarchy Types

Balanced

Complex

Simple

Ragged Variable Depth

Page 83: Agile Data Warehouse Design for Big Data Presentation

Graphically Depict Hierarchies

Page 84: Agile Data Warehouse Design for Big Data Presentation

Visualize The Hierarchies

Page 85: Agile Data Warehouse Design for Big Data Presentation

Paint The Organization

Page 86: Agile Data Warehouse Design for Big Data Presentation

Prototype! Not “Data Model Review”

Page 87: Agile Data Warehouse Design for Big Data Presentation

Recap• Collaborative and Agile

• Data Modeling

• Data Sourcing

• Data Conformance

• Requirements = Design • Slots directly into proven and mature dimensional data warehousing

design patterns

• Validation through Prototyping • Semi-automated build of dimensional data warehouse

• Perfect compliment to Agile BI Tools and Methods (e.g. Pentaho)

!87

Page 88: Agile Data Warehouse Design for Big Data Presentation

If you have been affected by any of the issues raised

in this presentation

Page 89: Agile Data Warehouse Design for Big Data Presentation

! Agile Data Warehouse Design

Lawrence Corr, Jim Stagnitto, Decision Press, November 2011

!

Page 90: Agile Data Warehouse Design for Big Data Presentation

Questions / Comments