Upload
ranjan-ganguli
View
23
Download
1
Embed Size (px)
Citation preview
Data Warehousing by Example
A Day at the IPL in Kolkata
Prepared By: RANJAN GANGULI
M.En(CSE),University Institute Of Technology
The University of Burdwan, West Bengal,India
email:[email protected]
Why? The purpose of this document is to present a ‘Best Practice’ approach to Data Warehouse design
based on some experiences as there is no complete and consistent design methodology to design a
data warehouse.
Getting Started:
Here, I use a trip to watch a Cricket match of IPL in Kolkata India, to show how we can apply our approach to
a real-world situation and develop a design for a tailored Dimensional Model. This approach leads to the
implementation of a Reference Data Architecture and the design of a Data Warehouse.
A Canonical Data Model (CDM) is central to this and the Design Patterns based on a CDM.
Best Practice suggests that when all the steps have been completed, each item produced
should be reviewed and extended or modified as appropriate.
Additional Events:
During my trip to Kolkata, more events can be added as:
Event –I. Buy IPL ticket
Event-II. Ate food in a restaurant
Event-III. Watch the IPL match
The Approach:
The Approach is to follow these Steps: Step 1 – Identify the Events involved Step 2 – Define a Design Pattern based on the Event-driven Canonical Data Model Step 3 - Define a Message Format for the data in each Event Step 4 - Design a 3rd Normal Form Data Warehouse (DWH) and update it for each Event. Step 5 – Define the format for loading data into the DWH for each Message
The reason for all the work that we have done to get to this point is, of course, to produce
Business Intelligence (‘BI’).
Canonical Data Model
(A General Format)
Typical Events could be: i. A Customer makes a Purchase
ii. A Supplier makes a Delivery of Merchandise Typical Documents could be : i. A Sales Receipt
ii. A Contract Letter
iii. A Delivery Note People and Organisations are examples of the Roles played by Parties. Parties are often shown in Models produce d by professional Data Models. In that case, Best Practice usually dictates that Semantic Models are produced to help business users
understand how Customers and so on, are modelled as Parties and Roles.
(Design Pattern based Canonical Data Model)
Message Format:
EVENT-I( BUY TICKET FOR IPL)
Activity Performed:
Step-I: Identify the Events:
Event-I. Buy Ticket for IPL Cricket Match
Step-II: The Design Pattern Below shows how the Design Pattern applies to this Event.
(Please see the above CDM format)
EVENT
(PURCHASE A TICKET)
SERVICES
(CHECK-IN)
STAFF
(PURCHASE A TICKET)
DOCUMENTS
i. Ticket for train/bus
ii. Ticket for IPL
LOCATION
(EDEN GARDEN)
Kolkata
CUSTOMERS
CREDIT CARD
Step-III: Message Format
This shows the data items on the Ticket:-
EVENT
DATE LOCATION PRICE DETAILS
Purchase a ticket
30.01.2017
KOLKATA
2700
BLOCK-A ROW-123
Step-IV: The 3NF Data ware House
The benefit of adopting a Third-Normal Form ERD is that it enforces a ‘Single View of the Truth’. If we adopt a Dimensional Model it is not so easy to achieve this. This shows the design of the Data Warehouse (DWH) after the first Event of Purchasing a Ticket
CUSTOMERS
CUSTOMER SERVICE
DOCUMENT
STAFF SERVICES
LOCATION
ADDRESS
CREDIT CARD
Step-V:
Data Mart
This shows the Data Mart for Ticket Sales.
Similarly:
Event 2 – Get Lunch This shows how we handle the Second Event.
The Design Pattern
This shows how the Design Pattern applies to this Event.
Message Format:
EVENT
DATE LOCATION PRICE DETAILS
Buy Lunch
Date and Time
At Restaurant
Total Price
Rice, Chicken,Dal etc
BUY LUNCH MENU STAFF
CUSTOMERS CREDIT-CARD RESTAURENT
SALES RECEIPT
DATAWARE HOUSE
Restaurant Data Mart:
This shows the Data Mart for Restaurant data.
Customers Service
Customers
Address
Credit Card
Supplier
Services
Document Location
Event 3 – Watch the IPL Match:
The Design Pattern This shows how the Design Pattern applies to this Event. In this case, the Event is the Match between two Teams and the Outcome is very important It is quite common for Event to have an Outcome, but so far, it has not been important enough to justify appearing at the top level.
Message Format:
EVENT
DATE LOCATION PRICE DETAILS
Watch the IPL
Date and Time
At Eden Garden
Price of Ticket
Results/Outcome
Competition Cricket Venue Staff
Audience
(Customers)
9
Outcome
My Cricket
Data Warehouse : This shows the design of the Data Warehouse (DWH) after the third Event of watching the cricket match.
Data Mart : This shows the Data Mart for Cricket Competition Results data.
Customer
Customer Service
Documents Location Outcome
Services
Supplier
Staff
Staff
Staff
Combined Data Mart:
This shows the three Data Marts: I. IPL Match Results
II. Restaurant Orders
III. Ticket Sales