41
Dimensional Modelling

Design Modelling

Embed Size (px)

Citation preview

Page 1: Design Modelling

Dimensional Modelling

Page 2: Design Modelling

Dimensional modeling has two basic concepts:FactsDimensions

Other ralates conceptsAggregatesMeta-data

Dimensional modeling is a technique for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business.

Dimensional Modelling

Page 3: Design Modelling

Fact

Definition

• A fact is a collection of related data items, consisting of measures

• A fact is a focus of interest for the decision making process. • Measures are continuously valued attributes that describe facts (Golfarelli et al)

• A fact is a business measure (Kimball and Ross)

What exactly is being analysed? what numbers are being analysed?

Page 4: Design Modelling

Examples of Facts

A university provides education services to its students. What are its facts and measures?Facts

Applications

Enrollment

Student Performance

Student Placement

Student awards

Measures

number, revenue from prospectus sales

number, revenue

grades, marks, %age marks, division

designation, nature of job, salary

Title, amount

Page 5: Design Modelling

Each fact typically represents

•a business item: an order

•a business transaction: order processing

•an event: arrival of an order

that can be used in analyzing the business or business processes.

Fact

Page 6: Design Modelling

Some Aspects of Facts

A fact is continuously valued. It takes a value from a a broad range of values.

The set of integersreal numbers

The most useful facts are numeric and additive: we almost never work with a single fact

Textual facts occur very rarely: free format and unpredictable contents make it impossible to analyse these

recent interest in unstructured DW look at these

Page 7: Design Modelling

Types of Facts• Additive: Additive facts are facts that can be

summed up through all of the dimensions in the fact table. E.g. Sales_Amount along date, product

• Semi-additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. E.g. current_balance along account not along date

• Non-additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact e.g. percentage or profit margin

Page 8: Design Modelling

Dimension

Definition

• The parameter over which we want to perform analysis of facts

sales is a fact; perform analysis over region, product, time

• The parameter that gives meaning to a measurenumber of customers is a fact, perform analysis over time

Discretely valued description that is more or less constant and participates in constraints

Qualifying characteristics that provide additional perspective to a given fact

Page 9: Design Modelling

Examples of Dimension

A university provides education services to its students. What are its facts and dimensions?

Facts

Applications

Enrollment

Performance

Placement

Student awards

Dimension

Age, Region

Region

Year, Discipline, Student

Year, Discipline, Grades

Discipline, Year

Page 10: Design Modelling

Dimensions and their Values

Dimension

Age

Region

Year

Discipline

Grades

Student

Dimension Value

10, 11, 12 …..

North, South

1999, 2000 ….

ECE, CSE, IT,...

A+, A,….

Name of student

Page 11: Design Modelling

Aspects of Dimensions

The values of dimensions do not change with timeslow changing dimensionsrapidly changing dimensions

Need to handle such changes

Dimensions are the primary source of query constraints, report headings, and groupings

Page 12: Design Modelling

Dimension Hierarchies/Categories

Dimensions are composed of smaller units called categories or members

simpler components forming a hierarchycountry, zone, branch, unit

Hierarchies are a basis for drill down and roll-up

special, notable unitsholidays

For special queries: sales performance on holidays

Page 13: Design Modelling

Organising Facts and Dimensions

The model should

provide drill down/roll up along dimension hierarchies

provide good data access

must be query centric

be optimised for queries and analysis

each dimension should be able to interact fully with the fact

Page 14: Design Modelling

The Star Schema

Fact

Dimension

Dimension Dimension Dimension

Dimension

A DW is a collection of star schemata

Page 15: Design Modelling

Example: Facts and Dimensions

Sales

Rupees

Year

Season

Month

RegionCity

Product type

Product name

Page 16: Design Modelling

Computing Fact Sizes

Let there be5000 products60 months50 cities

Number of sales facts = 5000*60*50= 15000000

Sales

Rupees

Year

Season

Month

RegionCity

Product type

Product name

Assume one sale fact per product, per city, per month

Page 17: Design Modelling

Sparse Facts

Not all 5000 products may be sold each month in each city

Assume that 3000 products are sold each month in each city

Number of sales facts = 3000 * 60 * 50 = 9000000

Approximately 60% of the cube is occupied and 40% is empty

Page 18: Design Modelling

Aggregation

We need the total sales for each region, product wise and month-wise

Number of products = 5000Number of regions = 5 Number of months = 60

Total number of facts = 5000*5*60 = 1500000Space-time tradeoff

if the frequency of use is high then pay the storage expenseAggregation guideline

if the number of facts summarised is more than 10, then do aggregation

Aggregation is performed in order to speed up common queries

Aggregates are pre-calculated summaries along dimension hierarchies derived from basic facts.

Page 19: Design Modelling

Aggregation

No aggregation

Year

Season

Month

Region

City

Product type

product name

One-way aggregation

Two-way aggregation

Three-way aggregation

When aggregation is done by rising along n-dimensions then n-way aggregation is said to be performed

Page 20: Design Modelling

Sparsity and Aggregation

As the amount of aggregation increases sparsity decreases

One-way aggregation on regions results in 1.5M facts

The probability of all 5000 products being sold in a month in a region is higher than of all 5000 being sold in a city

Two-way aggregation on regions and season results in 0.5M facts

The probability of all 5000 products being sold in a month in a region is higher than of all 5000 being sold in a region

Page 21: Design Modelling

Aggregation and the Star Schema

Each aggregate is a fact with its own derived dimensions

Derived dimensions may be defined ‘on the fly’

Sales summary by quarter, but quarter was not in the original dimension hierarchy

Each aggregate has its own star schema

Page 22: Design Modelling

Metadata contains the answers to questions about the data in the Data Warehouse

Different definitions :• Data about the data

• Tables of contents for the data

• Catalog for the data

• Data warehouse atlas

• Data warehouse roadmap

Metadata

Page 23: Design Modelling

Central Role of Metadata

Page 24: Design Modelling

Metadata for End Users

Page 25: Design Modelling

Example

Entity Name CustomerAliases Account, ClientDefinition Anyone who purchases hotel roomsSource Systems Reservations, Accounts, HousekeepingCreate Date 1 January 2000Last Update Date 13 September 2003Update Cycle weeklyFull refresh cycle six monthsData Quality Review 15 September 2003Planned Archival Every six months

Page 26: Design Modelling

Metadata for IT Professionals

Page 27: Design Modelling

Metadata Driven Data Warehouse Process

Page 28: Design Modelling

Data Acquisition Metadata Types

Page 29: Design Modelling

Information Delivery

• Functions:– Report generation– Query processing– Complex analysis

• Metadata recorded in the information delivery functional area– relate to predefined queries, predefined reports, and input

parameter definitions for queries and reports– also include information for OLAP.

Page 30: Design Modelling

Information Delivery Metadata Types

Page 31: Design Modelling

Challenges for Metadata Management

• Reconcile the formats of metadata of several tools• No industry-wide accepted standards• Centralized metadata repository : a collection of

fragmented metadata stores• No easy and accepted methods of passing metadata• Preserving version control of metadata• Unifying the metadata relating to the data sources can be

an enormous task

Page 32: Design Modelling

Common Warehouse Model

Foundation Metadata

Business information about model elementsData typesKeys and IndexesExpressionSoftware Deployment: software deployed in DWType Mapping: mapping of data types between different

systems

Page 33: Design Modelling

MetadataCommon Warehouse Model

Metadata for ResourceRelational data sourcesRecord data sourcesmultidimensional resourcesXML data sources

Analysis MetadataData transformation toolsOLAP processing toolsData mining toolsInformation visualisation toolsBusiness taxonomy and glossary

Page 34: Design Modelling

Common Warehouse Model

Management

Warehouse ProcessesResults of Warehouse Operations

Page 35: Design Modelling

The Star Schema Revisited

The Star contains ‘detailed’ facts and dimensions

Aggregates are facts and have their own dimensions

Meta-data support is built around the start schema

Page 36: Design Modelling

Star Schema: Benefits

• Depicts a fuller description of each dimension

• Explicitly shows multiple levels of aggregation on each dimension

• Depicts multiple facts at the intersection of all dimensions

• Directly implementable in a Relational DBMS

• Can utilize new, accelerated approaches to indexing, STARindex and joining, STARjoin

Page 37: Design Modelling

Dimensional Modelling vs. Spread Sheet

Annual product sales by region ($,000)======================================================================= REGION:PRODUCT: SOUTHERN WESTERN NORTHERN EASTERN TOTAL----------------------------------------------------------------------------------------------------------------------------Stibes $7,140 $14,790 $13,260 $15,810 $51,000Farkles 5,460 11,310 10,140 12,090 39,000Teglers 3,150 6,525 5,850 6,975 22,500Qwerts 5,250 11,875 10,750 12,625 40,500---------------------------------------------------------------------------------------------------------------------------- TOTALS: $21,000 $44,500 $40,000 $47,500 $153,000=======================================================================

Is this a Relational Table?What is the Entity?What is the Identifier?What are the Attributes?

How to make it a Relational Table?

How many Fact types?How many Dimensions?

Page 38: Design Modelling

Dimensional Modelling vs.

Relations

REGION PRODUCT SALESSouthern Stibes $7,140Southern Farkles 5,460Southern Teglers 3,150Southern Qwerts 5,250Western Stibes 14,790Western Farkles 11,310Western Teglers 6,525Western Qwerts 11,875Northern Stibes 13,260Northern Farkles 10,140Northern Teglers 5,850Northern Qwerts 10,750Eastern Stibes 15,810Eastern Farkles 12,090Eastern Teglers 6,975Eastern Qwerts 12,625

(all) Stibes 51,000(all) Qwerts 40,500Southern (all) 21,000(all) (all) 153,000

How many Facts?

How many Dimensions?

What type of Table?

What is the Identifier?

Where are the Dimension Tables?

REGION:NAME LEVEL---------------------(all) 1Southern 2Western 2Northern 2Eastern 2

Page 39: Design Modelling

ER Diagram

REGION Region ID Region Name

DEPARTMENT Department ID Department Name

PRODUCT GROUP Product Group ID Product Group Desc. Department ID (fk)

PRODUCT Product ID Product Desc. Product Group ID (fk)

STORE Store ID Store Name Address City State ZipCode Region ID (fk)

SALES Sales Date Store ID (fk) Product ID (fk) Sale Amount Sale Units

INVENTORY Week Store ID (fk) Product ID (fk) Quantity

Page 40: Design Modelling

ER Diagram

Good for OLTP

Update in exactly one place

No redundancy

Oriented towards insertion, deletion. Modification of data

weak entities/relationships create normalised structures

What are the facts and dimensions?

Page 41: Design Modelling

Transformation of ER to Star

DEPARTMENT

PRODUCT GROUP

PRODUCT ITEM

REGION

STORE

YEAR

MONTH

WEEK

DATE

ProductDimension

LocationDimension

TimeDimension

SALES FACTS