View
2
Download
0
Category
Preview:
Citation preview
Dimensional Fact Model
Stuttgart, 26/11/2014 Stefano Cazzella @StefanoCazzella http://caccio.blogdns.net http://bimodeler.com stefano.cazzella{at}gmail.com
1 BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
Complexity in SE and IS development
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 2
The art of programming is the art of organizing complexity, of mastering multitude and avoiding its bastard chaos as effectively as possible.
– Edsger Dijkstra, “Notes on Structured Programming”
Project Layers
• User requirements • Conceptual model Business
• Technical choices • Logical model Design
• Tecnology • Physical model Build
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 3
Civil Engineering Example
Business
What the client wants
Design
The technical blueprint
Build
The desired building
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 4
Model-driven engineering
• Business centric
• No tecnical details
PIM
• Tecnical design
• System architecture
PSM • Tecnical deliverables
• System realization
Build
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 5
Model transformation
Model transformation
Project Layers for Data Mart
• DFM Business
• Relational model Design
• DBMS specific DDL Build
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 6
Dimensional Fact Model
Why Dimensional Fact Model ?
Formal language à well-specified syntax and an unequivocally interpretation (semantic) based on a sound algebraic definition
Simple and effective graphical notation (representation)
Specifically defined to represent multi-dimensional models
Does not imply any technical/implementation choice
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 7
1
2
3
4
DFM Notation Compendium
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 8
Data Mart building process
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 9
Business user’s needs
Model transformation
Logical data model (Relational model:
tables, columns, etc.)
Phisical data model (DDL with indexes,
partions, etc.)
Model transformation
Multidimensional data model
(Dimensional Fact Model)
Requirements definition
Data Mart
Deployment
Technical specifications
Implementation strategy
+ =
Data Mart building process
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 10
Business user’s needs
Model transformation
Logical data model (Relational model:
tables, columns, etc.)
Phisical data model (DDL with indexes,
partions, etc.)
Model transformation
Multidimensional data model
(Dimensional Fact Model)
Requirements definition
Data Mart
Deployment
Technical specifications
Implementation strategy
+ = Formalize user’s needs in a conceptual (business-centric) model, then …
… transform it in a logical model integrating technical specification …
… and transform it again in a physical model that realizes the business requirements
Business - From requisite to DFM
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 11
• Context: weblog analytics - the analysis of the visits of several web sites belonging to different domains (eg. Google Analytics)
• Requisite: monitoring and analyzing the number of visits and their monthly and daily average duration for each page of the websites, or each domain, distributed by the geographic region of the IP of the visitors.
11 BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
þ Domain definition þ Aggregation rules þ Optional dependencies
+
Design choice
• Star-schema (denormalized dimension table) • Snow-flake (hierarchies implemented by tables in 3NF)
Reference ROLAP model:
• Use natural key (the dimension attribute à PK column) • Use surrogate key (add a new column with no business meaning) • Use slow-changing dimension (SCD) of type 2 • Use implicit dimension (no dimension table, only a column in the fact table)
Hierarchy implementation strategy (for every dimension)
• Text à VARCHAR(250) ; Currency à NUMBER(9,2) ; etc.
Domain ßà Data type association
• Table name prefix (D for Dimensions, F for Facts) ; Number à NBR ; etc.
Standard naming conventions and abbreviations
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 12
Transform DFM in a Relational Model
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 13
Model transformation
Fact grain Technical design choices: • Reference ROLAP model à star-schema • Hierarchy Viewerà use surrogate key • Hierarchy Page à SCD – Type 2
Surrogate key
SCD-2 Start date End date
13 BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
Build choice
• SqlServer – Oracle – Hive / Hadoop
Choice the DBMS
• Generate unique keys / primary keys / integrity constraints (foreign keys)
Generate constraints?
• Add clustered indexes / column-store indexes / bitmap indexes / etc.
Add specific indexes
• Organize fact tables in partitions (by hash, value, range, etc.)
Define table partitions
• Define file groups / tablespaces for tables, partitions, indexes
Distribute data over multiple volumes
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 14
Phisical model and DDL (1)
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 15
Implementation choices & best practice: • DBMS à SQL Server • Fact F_VISITS partitioned by year • Column-store index on day and duration • 2 distinct file groups for tables and indexes
Partition scheme and functions
Columnstore index
File groups
15 BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
Phisical model and DDL (2)
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 16
Implementation choices & best practice: • DBMS à Oracle • Fact F_VISITS partitioned by year • Bitmap index on viewer dimension • 2 distinct table spaces for tables and
indexes
Table partitions
Bitmap index
Table spaces
16 BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
BI Modeler
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 17
• In order to apply a model-driven approach, BI Project teams need a software tool to: þ Manage (draw) all the models - DFM, relational, etc. þ Support (and drive) the model transformation process
• There was (are) no many tools able to do that so, in 2006 I started working on the development of …
http://bimodeler.com
DEMO
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 18
Create a DFM about SALES from scratch
Define the fact schema and its measures
Add some dimensions / hierarchies
Define and associate domains to attributes and measures
Transform a DFM in a relational data model
Define an implementation strategy for Hierarchies
Associate Data type to domains
Apply a naming convention
Add physical properties to the relational model
Choose a DBMS
Create partitions
Create indexes
Generate DDL
Recommended