Upload
yoganand-reddy-sankepalli
View
215
Download
0
Embed Size (px)
Citation preview
8/9/2019 Dimensional Design V26.6.02
1/123
Data ModelingData Modeling
8/9/2019 Dimensional Design V26.6.02
2/123
8/9/2019 Dimensional Design V26.6.02
3/123
!usiness"rocess
#onceptual
$ogical
Model"hysical
Model
Leel! o" modeling
8/9/2019 Dimensional Design V26.6.02
4/123
Leel! o" modelingLeel! o" modeling
# #onceptual modeling
Describe data requirements from a business point of
view without technical details
# $ogical modeling %efine conceptual models
Data structure oriented& platform independent
# "hysical modeling
Detailed specification of what is physically
implemented using specific technology
8/9/2019 Dimensional Design V26.6.02
5/123
Con$eptual ModelCon$eptual Model
# A conceptual model shows data through
business eyes.
# All entities which have business meaning.
# 'mportant relationships
# ew significant attributes in the entities.
# ew identifiers or candidate("candidates) *eys.
8/9/2019 Dimensional Design V26.6.02
6/123
Logi$al ModelLogi$al Model
# %eplaces many+to+many relationships withassociative entities.
# Defines a full population of entity attributes.
# May use non+physical entities for domains andsub+types.
# ,stablishes entity identifiers.
# Has no specifics for any %D!M- orconfiguration.
# ,% Diagram ey !ased Modeling ullyAttributed Model
8/9/2019 Dimensional Design V26.6.02
7/123
%&y!i$al%&y!i$alModelModel
# A "hysical data model may include
%eferential 'ntegrity
'nde/es
0iews
Alternate *eys and other constraints
1ablespaces and physical storage ob2ects.
8/9/2019 Dimensional Design V26.6.02
8/123
Dimen!ional ModelingDimen!ional Modeling
8/9/2019 Dimensional Design V26.6.02
9/123
Dimen!ional ModelingDimen!ional Modeling
Dimensional Modeling is the pillar of a Data
Warehouse
It comes when the business requirements are clear
and KPIs have been identified
Typically the logical data model are delivered in theanalysis phase and the physical data model comes in
the Design stage of the DW project life cycle
8/9/2019 Dimensional Design V26.6.02
10/123
!perations"ales andMar#eting
$ustomer"ervices
ProductDevelopment
&e Bu!ine!! %ro$e!!e!&e Bu!ine!! %ro$e!!e!
# A series of interrelated business processeswhich contribute to increased product
value for the customer& and to profit for
the enterprise + "orter 3456
8/9/2019 Dimensional Design V26.6.02
11/123
&e Bu!ine!! %ro$e!!e!&e Bu!ine!! %ro$e!!e!
# !usinesses constantly strive to optimize each
process in the value chain
# 7ptimization requires measuring and
analyzing the effectiveness of each process aswell as the value chain as a whole
!perations
"ales and
Mar#eting
$ustomer
"ervices
Product
Development
8/9/2019 Dimensional Design V26.6.02
12/123
(L% Sy!tem!(L% Sy!tem!
Manufacturing
and Process
$ontrol
"ales !rder %ntry
and $ampaign
Management
$ustomer
"upport and
&elationship
Management
"hipping and
Inventory
Management
!perations
"ales and
Mar#eting
$ustomer
"ervices
Product
Development
8/9/2019 Dimensional Design V26.6.02
13/123
8/9/2019 Dimensional Design V26.6.02
14/123
8/9/2019 Dimensional Design V26.6.02
15/123
(L% Data Model(L% Data Model
# ocus of 7$1" Design 'ndividual data elements
Data relationships
# Design goals Accurately model
business
%emove redundancy
8/9/2019 Dimensional Design V26.6.02
16/123
*&y (L% De!ign i! +ad "or a D**&y (L% De!ign i! +ad "or a D*
# #omple/
# 8nfamiliar to
business people
# 'ncomplete history
# -low query
performance
8/9/2019 Dimensional Design V26.6.02
17/123
&e !olution ) Dimen!ional Model&e !olution ) Dimen!ional Model
# $ogical modeling technique
or designing relational database structures
# Addresses 7$1" design shortcomings
or use in analytic systems# irst developed early 3459:s
"ac*aged goods industry
# "opularized by %alph imball& "hD.
344; boo*< :1he Data arehouse 1ool*it:
8/9/2019 Dimensional Design V26.6.02
18/123
Dimen!ional Modeling Ba!i$!Dimen!ional Modeling Ba!i$!
8/9/2019 Dimensional Design V26.6.02
19/123
ypi$al +u!ine!! re,uirement!ypi$al +u!ine!! re,uirement!
'I need to see
overall grossmargin by
category'
'What are
outstandingreceivables
by ()*
account+'
Process,oriented business questions
'-ow do inventory
levels comparewith sales by
product and
warehouse+'
What is the
return rate foreach supplier+
!perations
"ales and
Mar#eting
$ustomer
"ervices
Product
Development
8/9/2019 Dimensional Design V26.6.02
20/123
-ey %er"orman$e Indi$ator!-ey %er"orman$e Indi$ator!
Process,oriented business measures
gross margin receivables inventory levels.sales return rate
!perations
"ales and
Mar#eting
$ustomer
"ervices
Product
Development
8/9/2019 Dimensional Design V26.6.02
21/123
Brand
Captain
Coffee
Product
Standard
Coffee
Maker
Thermal
Coffee
Maker
Deluxe
CoffeeMaker
All
Products
Units Sold
5,
!,"
!,#$
%,"#$
/nits "hipped
$,&
',($!
',(5&
0.121
) Shipped
#()
(&)
&)
#5)
$offee Ma#er 3ulfillment &eport
*acts*acts
%ro$e!! Mea!urement)*&at e !ee%ro$e!! Mea!urement)*&at e !ee
# Measures Metrics or indicators by
which people evaluate a
business process
%eferred to as =acts>
# ,/amples
Margin
'nventory Amount
-ales Dollars
%eceivable Dollars
%eturn %ate
8/9/2019 Dimensional Design V26.6.02
22/123
Brand
Captain
Coffee
Product
Standard
Coffee
Maker
Thermal
Coffee
Maker
Deluxe
Coffee
Maker
All
Products
/nits "old
5,
!,"
!,#$
2.405
/nits "hipped
$,&
',($!
',(5&
0.121
6 "hipped
#()
(&)
&)
076
$offee Ma#er 3ulfillment &eport
Dimensions
%ro$e!! %er!pe$tie!)$ro!! &at%ro$e!! %er!pe$tie!)$ro!! &at
# Dimensions 1he parameters by which
measures are viewed
8sed to brea* out& filter or
roll up measures
7ften found after the word=by> in a business question
Descriptive business terms
# ,/amples "roduct
arehouse
#ustomer
-upplier
8/9/2019 Dimensional Design V26.6.02
23/123
Dimen!ional ModelDimen!ional Model
# Definition
$ogical data model used to represent the measures and
dimensions that pertain to one or more business sub2ect
areas
Dimensional Model ? -tar -chema
# -erves as basis for the design of a relational database
schema
# #an easily translate into multi+dimensional database
design if required
# 7vercomes 7$1" design shortcomings
8/9/2019 Dimensional Design V26.6.02
24/123
Dimen!ional Model dantage!Dimen!ional Model dantage!
# 8nderstandable
# -ystematically
represents history
# %eliable 2oin paths
# High performance query
# ,nterprise scalability
8/9/2019 Dimensional Design V26.6.02
25/123
"tore"tore
Star SchemaStar Schema
TimeTime
ProductProduct
3acts3acts
Data *are&ou!ing S$&emaData *are&ou!ing S$&ema
# ewer tables
Denormalized
#onsolidated
# Dimensional
amiliar to users
acts go in the fact tables
Dimensions in dimension
tables
# 'ncreases understandability
8/9/2019 Dimensional Design V26.6.02
26/123
eer Join %at&! a!ter ,ueryeer Join %at&! a!ter ,uery
8/9/2019 Dimensional Design V26.6.02
27/123
eer Join %at&! ) a!ter ,ueryeer Join %at&! ) a!ter ,uery
per"orman$eper"orman$e
# -tar schema 2oins
Defined during schema
design + not runtime
!usiness people can easily
understand theserelationships
7ne+to+many relations
between dimensions and
facts %eferential integrity
always enforced
8/9/2019 Dimensional Design V26.6.02
28/123
(t&er dantage!(t&er dantage!
# Deterministic querypatterns
# -tar schema queryoptimizationsupported by allma2or %D!M-
vendors
8/9/2019 Dimensional Design V26.6.02
29/123
Su+e$t rea Model! Data Mart!Su+e$t rea Model! Data Mart!
Manufacturing and
Process $ontrol
"ales !rder %ntry
and $ampaign
Management
$ustomer "upport
and &elationship
Management
"hipping and
Inventory
Management
Su/0ect are
dimensional
model
Su/0ect are
123 model
!perations
"ales and
Mar#eting
$ustomer
"ervices
Product
Development
8/9/2019 Dimensional Design V26.6.02
30/123
nterpri!e Model!nterpri!e Model!
1nterprise
Scope 123
model
1nterprise
scope
dimensional
model
8/9/2019 Dimensional Design V26.6.02
31/123
Dimen!ional Modeling De!ign Detail!Dimen!ional Modeling De!ign Detail!
8/9/2019 Dimensional Design V26.6.02
32/123
8/9/2019 Dimensional Design V26.6.02
33/123
key
key
key
Dimension
Dimension
Dimension
Dimen!ion -ey!)3enerated "or D*Dimen!ion -ey!)3enerated "or D*
# -ynthetic *eys
,ach table assigned a
unique primary *ey&
specifically generated
for the data warehouse
"rimary *eys from
source systems may be
present in the
dimension& but are not
used as primary *eys in
the star schema
8/9/2019 Dimensional Design V26.6.02
34/123
Key
attribute
attribute
attribute
Key
attribute
attribute
attribute
Key
attribute
attribute
attribute
Dimension
Dimension
Dimension
Dimen!ion ttri+ute!Dimen!ion ttri+ute!
# Dimension attributes -pecify the way in
which measures are
viewed< rolled up&
bro*en out or
summarized
7ften follow the word
=by> as in =-how me
-ales by %egion and
@uarter> requently referred to
as :Dimensions:
8/9/2019 Dimensional Design V26.6.02
35/123
3act Table
fact8
fact9
fact5
a$t a+le!a$t a+le!
# "rocess measures -tart by assigning one
fact table per business
sub2ect area
act tables store theprocess measures
(acts)
#ompared to dimension
tables& fact tables
usually have a verylarge number of rows
8/9/2019 Dimensional Design V26.6.02
36/123
3act Table
fact8
fact9
fact5
key
key
key
a$t a+le %rimary -eya$t a+le %rimary -ey
# ,very fact tableMulti+part primary
*ey added
Made up of foreign*eys referencing
dimensions
8/9/2019 Dimensional Design V26.6.02
37/123
a$t a+le Spar!itya$t a+le Spar!ity
# -parsity 1erm used to describe the very common situation
where a fact table does not contain a row for every
combination of every dimension table row for a
given time period
!ecause fact tables contain a very small percentage
of all possible combinations& they are said to be
sparsely populated or sparse
8/9/2019 Dimensional Design V26.6.02
38/123
3act Table
a$t a+le 3raina$t a+le 3rain
# Brain 1he level of detail
represented by a row in
the fact table
Must be identified early
#ause of greatest
confusion during design
process
# ,/ample ,ach row in the fact table
represents the daily item
sales total
8/9/2019 Dimensional Design V26.6.02
39/123
De!igning a Star S$&emaDe!igning a Star S$&ema
# ive initial design steps
# !ased on imball:s si/ steps
# -tart designing in order
# %e+visit and ad2ust over pro2ect life
8/9/2019 Dimensional Design V26.6.02
40/123
8/9/2019 Dimensional Design V26.6.02
41/123
a$t a+le Detail!a$t a+le Detail!
8/9/2019 Dimensional Design V26.6.02
42/123
8/9/2019 Dimensional Design V26.6.02
43/123
4ample a$t a+le 5e$ord!4ample a$t a+le 5e$ord!
time:#ey model:#ey dealer:#ey revenue quantity
' ' ' #5&"7!# !
' ! ' '5!!(7$# $
' $ ' !&$(7'5 '
' " ' '$!(#57!! "
' 5 ' "$#&%7"5 '
' ' ! $5(#&7%& '
' $ ! 5#&("7#& !
' 5 ! %!(7(# !
Primar+ 8e+ *acts
-ales acts
8/9/2019 Dimensional Design V26.6.02
44/123
8/9/2019 Dimensional Design V26.6.02
45/123
4ample6 dditie a$t!4ample6 dditie a$t!
Model
model_key
/rand
cate.or+
line
model
"ales 3acts
model_key
dealer_key
time_key
re6enue
uantit+
Time
time_key
+ear
uarter
month
date
Dealer
dealer_key
re.ion
state
cit+
dealer
" "
8/9/2019 Dimensional Design V26.6.02
46/123
ype! o" a$t!ype! o" a$t!
# -emi+additive #an be summed across most dimensions but not all
,/amples< 'nventory quantities& account balances&
or personnel counts
Anything that measures a =level>
Must be careful with ad+hoc reporting
7ften aggregated across the =forbidden dimension>
by averaging
l S i ddi i l S i dditi t
8/9/2019 Dimensional Design V26.6.02
47/123
4ample6 Semi)additie a$t!4ample6 Semi)additie a$t!
"ales 3acts
model_key
dealer_key
time_key
in6entor+
Model
model_key
/rand
cate.or+
line
model
Time
time_key
+ear
uarter
month
date
Dealer
dealer_key
re.ion
state
cit+
dealer
" "
8/9/2019 Dimensional Design V26.6.02
48/123
ype! o" a$t!ype! o" a$t!
# Con+Additive
#annot be summed across any dimension
All ratios are non+additive
!rea* down to fully additive components&
store them in fact table
l 7 ddi i l 7 dditi t
8/9/2019 Dimensional Design V26.6.02
49/123
4ample6 7on)dditie a$t!4ample6 7on)dditie a$t!
Margin_rate is non-additiveMargin_rate = margin_amt/revenue
model_key
dealer_key
time_key
revenue
margin_amt
time_key
year
quarter
month
date
model_key
brand
category
line
model
Model Sales Facts
dealer_key
region
state
city
dealer
Dealer
Time
8 i 8 it t
8/9/2019 Dimensional Design V26.6.02
50/123
8nit mount!8nit mount!
# 8nit price& 8nit cost& etc.
Are numeric& but not measures
-tore the e/tended amounts which are
additive
8nit amounts may be useful as dimensions
for =price point analysis>
May store unit values to save space
l +l tl t +l
8/9/2019 Dimensional Design V26.6.02
51/123
a$tle!! a$t a+lea$tle!! a$t a+le
# A fact table with no measures in it
# Cothing to measure...
# ,/cept the convergence of dimensional
attributes
# -ometimes store a =3> for convenience
# ,/amples< Attendance& #ustomer
Assignments& #overage
8/9/2019 Dimensional Design V26.6.02
52/123
Dimen!ion a+le Detail!Dimen!ion a+le Detail!
l Di i +l l Di i +l
8/9/2019 Dimensional Design V26.6.02
53/123
4ample Dimen!ion a+le!4ample Dimen!ion a+le!
dealer_key
region
state
city
dealer
model_key
brandcategory
line
model
Model time_key
year
quarter
month
date
Time
Dealer
l Di i +l 5 d l Di i +l 5 d
8/9/2019 Dimensional Design V26.6.02
54/123
4ample Dimen!ion a+le 5e$ord!4ample Dimen!ion a+le 5e$ord!
time:#ey year quarter month date
' '%%# 9' :anuar+ '2'52%#
! '%%# 9' :anuar+ '2'(2%#
$ '%%# 9' :anuar+ '2'#2%#
'5 '%%# 9! April "2'2%#
### '%%& 9" ;cto/er '2'$2%&
S+nthetic 8e+ Attri/utes
Time Dimension
8/9/2019 Dimensional Design V26.6.02
55/123
4ample Dimen!ion a+le 5e$ord!4ample Dimen!ion a+le 5e$ord!
dealer:#ey region state city dealer
' s
!
8/9/2019 Dimensional Design V26.6.02
56/123
8/9/2019 Dimensional Design V26.6.02
57/123
l S "l ; S & l S "l ; S &
8/9/2019 Dimensional Design V26.6.02
58/123
4ample Sno"la;e S$&ema4ample Sno"la;e S$&ema
category_key
category
brand_key
brand_key
brand
Brand
Category
line_key
line
category_key
Line
model_key
model
line_key
Model
model_key
dealer_key
time_key
revenue
quantity
SalesFacts
date_key
date
month_ke
y
Day
month_key
month
quarter_ke
y
Monthquarter_ke
y
quarter
year_key
Quarteryear_key
year
Year
dealer_ke
y
dealer
city_key
Dealercity_key
citystate_key
Citystate_key
state
region_key
Stateregion_ke
y
region
Region
Sl l C& i Di iSloly C&anging Dimen!ion!
8/9/2019 Dimensional Design V26.6.02
59/123
Sloly C&anging Dimen!ion!Sloly C&anging Dimen!ion!
# Dimension source data may change over time# %elative to fact tables& dimension records
change slowly
# Allows dimensions to have multiple :profiles:over time to maintain history
# ,ach profile is a separate record in a dimension
table
Sl l C& i Di i lSl l C& i Di i l
8/9/2019 Dimensional Design V26.6.02
60/123
Sloly C&anging Dimen!ion 4ampleSloly C&anging Dimen!ion 4ample
# ,/ample< A woman gets married"ossible changes to customer dimension
# $ast Came
# Marriage -tatus
# Address# Household 'ncome
,/isting facts need to remain associated with her
single profile
Cew facts need to be associated with her marriedprofile
8/9/2019 Dimensional Design V26.6.02
61/123
8/9/2019 Dimensional Design V26.6.02
62/123
De!igning Load! to
8/9/2019 Dimensional Design V26.6.02
63/123
Customer DimensionTableColumn Name SCD TyeCustomer !ey N/"
Customer #D $
Name $
Marital Status $%ome #ncome $
De!igning Load! to
8/9/2019 Dimensional Design V26.6.02
64/123
ype 1 4ampleype 1 4ample
CustID Name
MaritalStatus
1! Sue"ones #$!%& #### ## # ## S
'omeIncome
CustIdCust
ID
NameStatus
1 1! Sue "ones S $!%(%
'omeIncome
Cust(ey
Cust(ey
Day(ey Sales
1 1$)%1 1 $)%
Day Dim
Day(ey
BusinessDate
1 1*!1*%1
Sales &actsCustomer DimCustomer '(T)
Day(ey
BusinessDate
1 1*!1*%1
*%1*%1
Day Dim
Cust(ey
Day(ey Sales
1 1$)%1 $#%
Sales &acts
CustID Name
MaritalStatus
1! Sue Smith M$+%(
'omeIncome
Customer '(T)
Status
Customer Dim
CustID Name
MaritalStatus
1 1! Sue Smith M $+%(%
'omeIncome
Cust(ey Status
,LT- Star Schema
Sue .ets Married *1*%1
ype 1 4ampleype 1 4ample
8/9/2019 Dimensional Design V26.6.02
65/123
ype 1 4ampleype 1 4ample
# 7bservations #ustomer history is not maintained in the 7$1"
system
#ustomer history is not maintained in the star
schema -ue only has one customer :profile: in customer
dimension table
-ueGs sales facts across all history are associatedwith her married profile
-ales facts that were associated with -ueGs singleprofile have been lost
De!igning Load! to
8/9/2019 Dimensional Design V26.6.02
66/123
Customer DimensionTableColumn Name SCD Tye
Customer !ey N/"
Customer #D *
Name *Marital Status *
%ome #ncome $
De!igning Load! to
8/9/2019 Dimensional Design V26.6.02
67/123
ype 2 4ampleype 2 4ample
CustID Name
MaritalStatus
1! Sue "ones S!%(
Day Dim
'omeIncome
CustID Name
MaritalStatus
1 1! Sue"ones S $!%(%
'omeIncome
Cust(ey
Cust(ey
Day(ey Sales
1 1$)%
Day(ey
BusinessDate
1 1*!1*%1
Sales &actsCustomer DimCustomer '(T)
Cust(ey
Day(ey Sales
1 1$)% $#%
Sales &acts
CustID Name
MaritalStatus
1 1! Sue "ones S $!%(1
'omeIncome
Cust(ey Status
1! Sue Smith M $+%(%
Customer Dim
CustID Name
MaritalStatus
1! Sue Smith M$+%(
'omeIncome
Customer '(T)
Status
,LT- Star Schema
Sue .ets Married *1*%1
Day Dim
Day(ey
BusinessDate
1 1*!1*%1
*%1*%1
ype 2 4ampleype 2 4ample
8/9/2019 Dimensional Design V26.6.02
68/123
ype 2 4ampleype 2 4ample
# 1ype E 7bservations
#ustomer history is not maintained in the 7$1"
system
#ustomer history is maintained in the star schema
-ue has two :profiles: in the customer dimension
-ueGs sales facts may be analyzed for when she was
single& when she was married& and across all historyby using the customer id field
Home income was updatedin the new profile record
Sloly C&anging Dimen!ion di$eSloly C&anging Dimen!ion di$e
8/9/2019 Dimensional Design V26.6.02
69/123
Sloly C&anging Dimen!ion di$eSloly C&anging Dimen!ion di$e
# hen in doubt& design type E# hen a slowly changing dimension
speeds up + hy not move into a fact
Degenerate Dimen!ion!Degenerate Dimen!ion!
8/9/2019 Dimensional Design V26.6.02
70/123
Degenerate Dimen!ion!Degenerate Dimen!ion!
# Dimensions with no other place to go# -tored in the fact table
# Are not facts
# #ommon e/amples include invoice
numbers or order numbers
DrillingDrilling
8/9/2019 Dimensional Design V26.6.02
71/123
3e.ion
8/9/2019 Dimensional Design V26.6.02
72/123
+egion
Northeast
Southeast
,nits Sold +evenue
Quarterly /uto Sales
SummaryStateMaine
Ne .or
Massachusetts
&lorida
0eorgia
1irginia
+egion
Northeast
Southeast
Central
Northest
Southest
,nits Sold +evenue
Quarterly /uto SalesSummary
DrillingDrilling
# %olling up %emoving
dimensional detail
%olls up a measure
Has nothing to do
with how you drilled
down
DrillingDrilling
8/9/2019 Dimensional Design V26.6.02
73/123
DrillingDrilling
# Drilling acrossA query that involves more than one fact
table
Cot necessarily an action that changes how auser is loo*ing at the data
!est resolved by multiple -@$ passes
8/9/2019 Dimensional Design V26.6.02
74/123
ggregate S$&ema!ggregate S$&ema!
ggregate De!ign! %er"orman$e ngleggregate De!ign! %er"orman$e ngle
8/9/2019 Dimensional Design V26.6.02
75/123
ggregate De!ign! ) %er"orman$e ngleggregate De!ign! ) %er"orman$e ngle
# Aggregates "re+stored fact summaries
Along one or more dimensions
1he most effective tool for improving performance
# ,/amples
-ummary of sales by region& by product& by
category Monthly sales
ggregate Ba$;groundggregate Ba$;ground
8/9/2019 Dimensional Design V26.6.02
76/123
ggregate Ba$;groundggregate Ba$;ground
# Aggregate rationale'mprove end user query performance
%educe required #"8 cycles
"owerful cost saving tool
# %estrictions
Additive facts onlyMust use dimensional design
ggregate 3uideline!ggregate 3uideline!
8/9/2019 Dimensional Design V26.6.02
77/123
ggregate 3uideline!ggregate 3uideline!
# DonGt start with aggregates
# Design and build based on usage
# -ooner or later you:ll need to buildaggregates
ggregate ype! ) 4ample!ggregate ype! ) 4ample!
8/9/2019 Dimensional Design V26.6.02
78/123
ggregate ype! 4ample!ggregate ype! 4ample!
# -ummary 1ables
# Materialized 0iews
ggregate ype!ggregate ype!
8/9/2019 Dimensional Design V26.6.02
79/123
ggregate ype!ggregate ype!
# -eparate 1ables -eparate fact table for every aggregate
-eparate dimension table for every aggregate
dimension
-ame number of fact records as level field tables
# Advantage
%emoves possibility of double counting
-chema clarity
S t +lS t +l
8/9/2019 Dimensional Design V26.6.02
80/123
'ne 2ay"ggregate
Separate a+le!Separate a+le!
month_key
product_key
market_key
3uantity
"mount
MthlySalesFacts /gg
time_key
product_key
market_key
3uantity"mount
Sales Factsproduct_keyCategory
4rand
)roduct
Diet #ndicator
-roduct
month_key
.ear
&iscal )eriod
Month
Month
market_key
+egion
District
State
City
Mar&et
time_key.ear
&iscal )eriod
Month
Day
Day o5 2ee
Time
Separate a+le!Separate a+le!
8/9/2019 Dimensional Design V26.6.02
81/123
To 2ay"ggregate
Separate a+le!Separate a+le!
product_ke
yCategory4rand
)roduct
Diet #ndicator
-roduct
category_key
Category
Category
month_key
category_key
market_key
3uantity
"mount
Mnthly CatSales Facts/gg
month_key
.ear
&iscal )eriod
Month
Month
market_key
+egion
District
State
City
Mar&et
time_key.ear
&iscal )eriod
Month
Day
Day o5 2ee
Time
time_key
product_key
market_key
3uantity
"mount
Sales Facts
ggregate %it"all!ggregate %it"all!
8/9/2019 Dimensional Design V26.6.02
82/123
ggregate %it"all!ggregate %it"all!
# -parsity failure1erm used to describe the result of building
too many aggregate fact that do not
summarize enough rows.
hen -parsity failure occurs& a relatively
small star schema can grow (in terms of dis*
size) thousands of times.
-parsity failure ? aggregate e/plosion
ggregate De!ign 3uideline!ggregate De!ign 3uideline!
8/9/2019 Dimensional Design V26.6.02
83/123
ggregate De!ign 3uideline!ggregate De!ign 3uideline!
# %ule of twenty
1o avoid aggregate e/plosion
Ma*e sure each aggregate record
summarizes E9 or more lower+level records
# %emember
1otal number of possible fact tables in any
given dimensional model ? cartesian productof all levels in all the dimensions
ggregate Deploymentggregate Deployment
8/9/2019 Dimensional Design V26.6.02
84/123
ggregate Deploymentgg ega e ep oy e
# 'ncremental
# !ased on usage
# 1ransparent to users
# 1ypically warehouse D!A responsibility
8/9/2019 Dimensional Design V26.6.02
85/123
Multiple a$t a+le!Multiple a$t a+le!
Multiple a$t a+le!Multiple a$t a+le!
8/9/2019 Dimensional Design V26.6.02
86/123
Multiple a$t a+le!p
# Different business processes usually requiredifferent fact tables
# 1here are also several cases where a single
business process will require multiple fact
tables
#ore and custom
-napshot and transaction
#overage Aggregates
Di""erent Bu!ine!! %ro$e!!e!Di""erent Bu!ine!! %ro$e!!e!
8/9/2019 Dimensional Design V26.6.02
87/123
# Different business processes usually requiredifferent fact tables
# 'n practice& it may be hard to identify what a
=process> is
# -ometimes you can spot different processes
because measures are recorded
ith different dimensions
At differing grains
Di""erent Dimen!ion! or 3rainDi""erent Dimen!ion! or 3rain
8/9/2019 Dimensional Design V26.6.02
88/123
Di""erent Dimen!ion! or 3rain
product_key
Category
4rand
)roduct
Diet #ndicator
-roduct
time_key
product_ke
y
shipper_ke
y
market_key
3uantity
2eight
Shi0mentFacts
shipper_ke
y
name
tye
mode
address
Shi00er
time_key
.ear
&iscal )eriod
Month
Day
Day o5 2ee
Time
market_key
+egion
District
State
City
Mar&ettime_keyproduct_ke
y
market_key
3uantity
"mount
Sales Facts
Di""erent Dimen!ion! or 3rainDi""erent Dimen!ion! or 3rain
8/9/2019 Dimensional Design V26.6.02
89/123
# DonGt ta*e shortcuts with grain
1he :not applicable: dimension value
8sing a :not applicable: row in a dimension
confuses the grain and can introducereporting difficulty
Di""erent %oint! in imeDi""erent %oint! in ime
8/9/2019 Dimensional Design V26.6.02
90/123
# -ometimes& it is not easy to identify thediscrete business processes
# All measures may have the same
dimensionality or grain# Different measures are recorded at
different times
@uantity sold is not recorded at the sametime as quantity shipped
Di""erent imingDi""erent iming
8/9/2019 Dimensional Design V26.6.02
91/123
gg
# !uilding a single fact table would requirerecording zero or null for measures that
are not applicable at a point in time
# %eports would contain a confusingcombination of zeros& nulls& and absence
of data
Di""erent iming ) (ne a$t a+leDi""erent iming ) (ne a$t a+le
8/9/2019 Dimensional Design V26.6.02
92/123
market_key
+egion
District
State
City
gg
#nitially ill be null
time_key
product_key
market_key
3uantity_sold
"mount_sold
3uantity_shied
"mount_shied
Sales and
Shi0mentFacts
time_key
.ear
&iscal )eriod
Month
DayDay o5 2ee
Time
Mar&et
product_key
Category
4rand
)roduct
Diet #ndicator
-roduct
8/9/2019 Dimensional Design V26.6.02
93/123
Identi"ying Di""erent %ro$e!!e!Identi"ying Di""erent %ro$e!!e!
8/9/2019 Dimensional Design V26.6.02
94/123
y gy g
# $oo* at the measures in question
# -ort them into fact tables based on
Dimensions
Brain
Differing timings of events measured
8/9/2019 Dimensional Design V26.6.02
95/123
Core and Cu!tom S$&ema!Core and Cu!tom S$&ema!
8/9/2019 Dimensional Design V26.6.02
96/123
# 1here is a set of dimension attributes andmeasures shared in all cases
# Depending on the value in a dimension&
certain e/tra dimension attributes ormeasures are recorded
Heterogeneous products
1ypes of customers
Core andCore andCC t
8/9/2019 Dimensional Design V26.6.02
97/123
Cu!tomCu!tom
product_key
666
-roduct
customer_ke
y
666
Customer
checking_key
666custom checing
attributes
Chec&ing /ccounttime_key
checking_keybranch_key
customer_key
4alance
Transaction_count
666custom checing
5acts
Chec&ing/ccountFacts
time_key
product_key
branch_key
customer_key
4alanceTransaction_count
/ccount Facts
time_key
666
Time
branch_key
666
Branch
Core and Cu!tomCore and Cu!tom
8/9/2019 Dimensional Design V26.6.02
98/123
# #ore fact table and dimensions All attributes shared no matter what
Appropriate for analysis across entire sub2ect area
##ustom fact table andIor dimensions #ontain attributes specific to a particular dimension value (e.g.
=#hec*ing>)
7nly appropriate when the business question is limited to that
particular dimension value
-hould repeat shared facts to minimize need to access two facttables
Coerage S$&emaCoerage S$&ema
8/9/2019 Dimensional Design V26.6.02
99/123
# A star schema usually measure events thathappen
# %elationships between the dimensions involved
are not captured if events do not happen
# A coverage table fills the gap
hat did not sell that was on promotionJ
ho was assigned to that customerJ
# 8sually =factless>
Mea!uring *&at
8/9/2019 Dimensional Design V26.6.02
100/123
product_key
Category 4rand
)roduct
S!,
-roduct
customer_key
Name
Comany
"ccount
)hone_num
Customer
time_key
product_key
customer_key
rep_key
quantity
sales_dollars
Sales Facts
time_key
.ear
&iscal )eriod
MonthDay
Day o5 2ee
Time
rep_key
re_namere_hone
+egion
District
State
City
Salesre0
# -ales facts does not reveal who is assigned to acustomer if they do notsell
Coerage a+leCoerage a+le
8/9/2019 Dimensional Design V26.6.02
101/123
# #ustomerKcoverageKfacts shows who is assigned to acustomer at a point in time
customer_key
Name
Comany
"ccount
)hone_num
Customer
time_key
customer_key
rep_key
Customer
Co2erageFacts
time_key
Year
&iscal )eriod
Month
Day
Day o5 2ee
Time
rep_key
re_name
re_hone
+egion
District
State
City
Salesre0
Snap!&ot and ran!a$tionSnap!&ot and ran!a$tion
8/9/2019 Dimensional Design V26.6.02
102/123
# 0iewing a single process multiple ways# 1ransactions
1he changes to what is being measured
# -napshot
1he status at a point in time
# ,/ample
#hanges to inventory
#urrent status of inventory
Snap!&otSnap!&ot
8/9/2019 Dimensional Design V26.6.02
103/123
# How much is on hand todayJ# How much was on hand yesterdayJ
time_key
.ear
&iscal )eriod
Month
Day
Day o5 2ee
product_key
Category 4rand)roduct
S!,
-roduct
location_key2arehouse
2%_code
City
State
Location
time_key
product_key
location_key
quantity_on_hand
In2entorySna0shot Time
ran!a$tionran!a$tion
8/9/2019 Dimensional Design V26.6.02
104/123
# How did inventory change todayJ
# How much product was returned due to failed inspectionJ
product_key
Category 4rand
)roduct
S!,
-roduct
location_key2arehouse
2%_code
City
State
Location
time_key
product_key
location_key
transaction_type_k
ey
transaction_amount
In2entoryTransactions
time_key
.ear
&iscal )eriod
MonthDay
Day o5 2ee
Time
transaction_type_key
transaction_tye_codetransaction_tye
transaction_category
Transactionty0e
ggregate a+le!ggregate a+le!
8/9/2019 Dimensional Design V26.6.02
105/123
# Aggregate table
A fact table that summarizes another fact
table
#reated for performance reasons
#overed in previous section
Multiple a$t a+le SummaryMultiple a$t a+le Summary
8/9/2019 Dimensional Design V26.6.02
106/123
# Different processes need different tables# 'dentified with
Brain
Dimensionality
1iming
# -ame process may need multiple fact tables
Heterogeneous attributes
#overage
-napshot and transaction
Aggregates
8/9/2019 Dimensional Design V26.6.02
107/123
r$&ite$ted Data Mart!r$&ite$ted Data Mart!
Data MartData Mart
8/9/2019 Dimensional Design V26.6.02
108/123
# Meaning of the term :data mart: hasshifted over the last several years...
Data Mart r$&ite$ture 1>>?Data Mart r$&ite$ture 1>>?
8/9/2019 Dimensional Design V26.6.02
109/123
,0erationalSystems
76T6(676T6(6
So5tareSo5tare
Data3arehouse
/nalysis4sers
3uery 83uery 8
+eortin+eortin
gg
So5tareSo5tare
76T6(676T6(6
So5tareSo5tare
Data Marts
Data Mart r$&ite$ture 1>>@Data Mart r$&ite$ture 1>>@
8/9/2019 Dimensional Design V26.6.02
110/123
,0erationalSystems
56T6L6
So7t8areData Marts
8/9/2019 Dimensional Design V26.6.02
111/123
,0erational Systems
/nalysis4sers
Data Mart
Data3arehouse
56T6LSo7t8ar
e
Query 9Re0ortingSo7t8are
Data MartData Mart
8/9/2019 Dimensional Design V26.6.02
112/123
# arehouse -ub2ect Area
'ncremental warehouse development
#entralized architecture
Cot new
ell + suited to star schemas
AAStoepipe Data Mart!Stoepipe Data Mart!
8/9/2019 Dimensional Design V26.6.02
113/123
# =-tovepipe> data marts 'nconsistent and overlappingdata
Difficult and costly to
maintain
%edundant data load
#anGt drill across
'ntegration requires starting
over
# Dimensions not
conformed
Store Sales
Facts
-roduct
Time
:Day;
-roduct
Time
:Day;Shi0ments
Facts
3arehouse
3arehouse In2entory
Facts
-roduct
Month
Con"ormed Dimen!ion!Con"ormed Dimen!ion!
8/9/2019 Dimensional Design V26.6.02
114/123
# Definition
A dimension is conformed when
multiple fact tables share that
dimension
Con"ormed Dimen!ion!Con"ormed Dimen!ion!
8/9/2019 Dimensional Design V26.6.02
115/123
# Description -hared common dimensions
'ntegrates logical design
,nsures consistency between data marts Allows incremental development
'ndependent of physical location
-ome re+wor* may be required
Con"ormed Dimen!ion!Con"ormed Dimen!ion!
8/9/2019 Dimensional Design V26.6.02
116/123
# Advantages ,nables an incremental development approach
,asier and cheaper to maintain
Drastically reduces e/traction and loadingcomple/ity
Answers business questions that cross data marts
-upports both centralized and distributedarchitectures
Interlo$;ing Star S$&ema!Interlo$;ing Star S$&ema!
8/9/2019 Dimensional Design V26.6.02
117/123
StoreDimensio
nSales
Facts
-roductDimensio
n
Time
Dimensio
n
Shi0men
t Facts
3arehous
e
Dimensio
n
In2entor
y Facts
Month
Dimensio
n
Con5ormed DimensionsCon5ormed Dimensions
-im+all9! Data *are&ou!e Bu!-im+all9! Data *are&ou!e Bu!
8/9/2019 Dimensional Design V26.6.02
118/123
Store -roduct Day 3arehouse Month
SalesFacts
Shi0ment Facts
In2entory Facts
*&en to Con"orm*&en to Con"orm
8/9/2019 Dimensional Design V26.6.02
119/123
# 1wo approaches8p+front
As+you+go
!oth approaches wor*
# #hoose the approach that wor*s for you
Con"orm 8p rontCon"orm 8p ront
8/9/2019 Dimensional Design V26.6.02
120/123
Cross5nter0rise
/nalysis
CreateFirstect/reas
Con7orm all Dimensions
Finali?eDesign 9
BuildSu=>ect/rea 1
Finali?eDesign 9
BuildSu=>ect/rea
Finali?eDesign 9
BuildSu=>ect/rea !
Con"orm !)ou)3oCon"orm !)ou)3o
8/9/2019 Dimensional Design V26.6.02
121/123
Design = >uild
"ubject
uild
"ubjectuild
"ubjectuild
"ubject
8/9/2019 Dimensional Design V26.6.02
122/123
Data e?tracted from source system@ !*TP)3lat 3ilesA
depending on the updation of records
Temporary
8/9/2019 Dimensional Design V26.6.02
123/123
# %ationale for dimensional modeling# Dimensional modeling basics
# Dimensional modeling details
# act table details
# Dimension table details
# Design process
# Aggregate schemas
# Multiple fact tables# Architected data marts