27
DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Embed Size (px)

Citation preview

Page 1: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

DATA WAREHOUSING

A REWARDING CAREERRALPH KIMBALL

MARCH 2015

Data Warehousing

© Ralph Kimball, 2015

March 2015

Page 2: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

A Classy Problem

Challenge worthy of the best minds Durable, permanent: no quick technical

fixes Highly visible and important Constant new challenges Enormous investments in people and

technology Good salaries Interesting careers

Page 3: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Successful Qualities

You need to be interested in three things: The business The technology And what is #3?

Page 4: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

People!

The mission of the data warehouse is todeliver information most effectivelyto decision makers, who Are not technology enthusiasts Do not read the manuals But are VERY motivated to use information

to make decisions

You need to love business usersin spite of their frailties

Page 5: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Deliver Most Effectively

Simple Obvious, Recognizable Relevant, Actionable Minimize number of cognitive subgoals: count the

clicks…

Fast Keep your hand on the mouse Don’t leave your desk Remember the lesson of Google

Don’t give me the flimsy excuse that it’s OK for the query to run for 10 minutes because the answer is “complex” or “important”

We just searched billions of web pages in less than a second

Page 6: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

What’s a Meta For?

The Restaurant Metaphor You have a kitchen and a dining room

Page 7: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Building the Presentation Server The Platform for BI

Dimensional models (star schemas) Driven from business process SOURCES, not

reports Inherently distributed, but we will

INTEGRATE Faithfully maintains history Gracefully extensible, agile compatible Development built on standard techniques

Page 8: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Expose the Star Schema in the UI Platform for BI

Time Key (FK)Product Key (FK)Store Key (FK)Promotion Key (FK)DollarsUnitsCost

Product Key (PK)SKUDescriptionBrandCategoryPackage TypeSizeFlavor

Store Key (PK)Store IDStore NameAddressDistrictRegion

Promotion Key (PK)Promotion NamePromotion TypePrice TreatmentAd TreatmentDisplay TreatmentCoupon Type

Time dimension Sales fact table

Promotion dimension

Product dimension

Store dimension

Time Key (PK)SQL dateDay of WeekWeek NumberMonth

District Brand Total Dollars Total Cost Gross ProfitAtherton Clean Fast $ 1,233 $ 1,058 $ 175

dragand

drop!

dragand

drop!

dragand

drop!

dragand

drop!

compute...

Page 9: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Basic Modeling Techniques

Four steps in the design Choose the process

The data source Choose the grain

Business definition of the measurement Choose the dimensions

Single valued in presence of the grain Choose the facts

True to the grain

Page 10: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Dimensions are the Soul of the DW Wide,

verbose, denormalized

Ideal for bitmapped indexes

Attributes are the source of constraints and groupings

Page 11: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Resist the Urge to Snowflake the UI Denormalized dimension has exactly same content!

Page 12: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Facts are 1-to-1 with Measurements Fact record = event; Event = fact

record

Page 13: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Keeping the Pledge: Track History Track History with Slowly Changing

Dimensions Type 1 SCD: Overwrite

Page 14: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Type 2 SCD: The Primary Workhorse

Add a row and time stamps for each change

Page 15: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

The Chemistry of Fact Tables Three types are all you ever need

Transaction Grain Single point in space and time: an event

Periodic Snapshot Grain Behavior that has occurred in a repeating

period

Accumulating Snapshot Grain Behavior during a short lived process

with a beginning and an end, possibly not finished yet

Page 16: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Retail Sales Fact Table

Short list of facts, unpredictably sparse or dense

Page 17: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Bank Account Periodic Snapshot Open ended list of facts, predictably

dense

Page 18: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Inventory Accumulating Snapshot Open ended facts, many milepost dates

& updates

Page 19: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

The Most Powerful Result:Conformed Dimensions

The payload:

What is special about the 4th column? Creating and publishing

conformed dimensions and factsis 50% technical and 50% political

ProductManufacturing

ShipmentsWarehouse

Inventory Retail Sales Turns

Framis 2940 1887 761 21

Toggle 13338 9376 2448 14

Widget 7566 5748 2559 23

19

Page 20: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Advanced Techniques

Factless fact tables Hybrid SCD types Many valued dimensions and bridge

tables Ragged hierarchies of indeterminate

depth Rapidly changing monster dimensions Full list of 72 techniques in latest data

modeling book, full chapter on website

Page 21: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Manage with the DW Bus Matrix

Drive architecture, project management, communication

Page 22: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Big Data Use Cases

Behavior tracking Search ranking Ad tracking Location and proximity tracking Causal factor discovery Social CRM

Share of voice, audience engagement, conversation reach, active advocates, advocate influence, advocacy impact, resolution rate, resolution time, satisfaction score, topic trends, sentiment ratio, and idea impact

Financial account fraud detection/intervention System hacking detection/intervention On line game gesture tracking

Page 23: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

More Big Data Use Cases

Non-numeric data and unique algorithms Document similarity testing Genomics analysis Cohort group discovery Satellite image comparison CAT scan comparisons Big science data collection

Complex numeric data Smart utility meters Building sensors In flight aircraft status

Data bags – name/value pairs with ad hoc content

Page 24: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Houston: We Have a Problem The traditional pure relational data warehouse

architecture can’t handle ANY of these use cases. We need:

Non-scalar data: vectors, arrays, data bags, structured text, free text, images, waveforms

Iterative logic, complex branching, advanced statistics

Petabyte data sources loaded at gigabytes/second Analysis in place across thousands of distributed

processors, data often not in database format,full data scans often needed

Data loaded before structure is understood Analysis while loading

Page 25: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Hadoop for Exploratory DW/BI

• HDFS is NOT a database, it’s a file system!

HDFS Primary Files:

Sources: Trans-actions

Freetext

ImagesMachines/ Sensors

Links/Networks

Metadata (system table):

HCatalog

SQL Query Engines:

Hive

Impala

BI Tools:

Tableau

Industry standard HW;Fault tolerant; Replicated; Write once(!); Agnostic content; Scalable to “infinity”

Others…

Bus Obj

Cognos

QlikView

Others…

All clients can use this to read files

These are query engines, not databases!

Purpose built for EXTREME I/O speeds;Use ETL tool or Sqoop

EDWOverflow

Page 26: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

Starting a Data Warehouse Career Join a DW/BI project (leverage your

contacts) Experience trumps everything

ETL tools, BI tools, databases, Java, SQL, …

Migrate over time among Business (end user department) Technology (IT architecture team, ETL

development) People (user interface design, BI

development)

Read! For instance:

Page 27: DATA WAREHOUSING A REWARDING CAREER RALPH KIMBALL MARCH 2015 Data Warehousing © Ralph Kimball, 2015 March 2015

The Kimball Group Resource

www.kimballgroup.com Best selling data warehouse books

NEW BOOK!The Classic “Toolkit” 3rd Ed.

White Paperson Integration, Data Quality,and Big Data Analytics

Cloudera Webinars (www.Cloudera.com)Hadoop 101 for EDW ProfessionalsEDW 101 for Hadoop Professionals