Corporate 2 TemplateIntroduction to OLAP operations
What is Data Warehouse ?
A data warehouse is a repository of an organization's stored data
that is designed for query and analysis rather than
for transaction processing to facilitate reporting and
analysis.
It usually contains historical data derived from
transaction data, but it can include data from other sources.
It separates analysis workload from transaction workload and
enables an organization to consolidate data from several
sources.
CPIT 440
*
Difference between Data Warehouse and Database
A question we often asks out in the field is: I already have a
database, so why do I need a data warehouse ? What is the
difference between a database vs. a data warehouse?
CPIT 440
It isn’t designed to handle and do analytics well.
It exists as a layer on top of another database or databases, and
takes the data from all these databases and creates a layer
optimized for and dedicated to analytics.
www.company.com
Introduction to cubes:
A cube is a set of data that is usually constructed from a subset
of a data warehouse and is organized and summarized into a
multidimensional structure defined by a set of dimensions and
measures.
Cubes are the main objects in online analytic processing
(OLAP),
It is a technology that provides fast access to data in a data
warehouse.
CPIT 440
Cube Structure:
Every cube has a schema, which is the set of joined tables in the
data warehouse from which the cube draws its source data.
The central table in the schema is the fact table, the source of
the cube's measures.
The other tables are dimension tables, the sources of the cube's
dimensions.
CPIT 440
A cube's structure is defined by its measures and dimensions.
They are derived from tables in the cube's data source.
The set of tables from which a cube's measures and dimensions are
derived is called the cube's schema.
Every cube schema consists of a single fact table and one or more
dimension tables.
The cube's measures are derived from columns in the fact
table.
The cube's dimensions are derived from columns in the dimension
tables.
CPIT 440
Cube Structure
Star schema: A fact table in the middle connected to a set of
dimension tables
Snowflake schema: A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
Fact constellations: Multiple fact tables share dimension tables,
viewed as a collection of stars, therefore called galaxy schema or
fact constellation
CPIT 440
Roll down: reverse of roll-up
Make detailed data, or introducing new dimensions
Slice and dice
CPIT 440
Exercise 1
Suppose that a data warehouse consists of the three dimensions:
time, doctor, and patient, and the two measures count and charge,
where charge is the fee that a doctor charges a patient for a
visit.
CPIT 440
Exercise 1
(a) Enumerate three classes of schemas that are popularly used for
modeling data warehouses.
Three classes of schemas popularly used for modeling data
warehouses are
The star schema,
The snowflake schema
Exercise 1
(b) Draw a schema diagram for the above data warehouse using one of
the schema classes listed in part (a).
CPIT 440
Exercise 1
(c) Starting with the base cuboid [day; doctor; patient], what
specific OLAP operations should be performed in order to list the
total fee collected by each doctor in 2004?
CPIT 440
Roll-up on time from day to year.
Slice for time=2004.
www.company.com
Exercise 2
Suppose that a data warehouse for Big University consists of the
following four dimensions: student, course, semester, and
instructor, and two measures count and avg. grade.
When at the lowest conceptual level (e.g.,for a given student,
course, semester, and instructor combination), the avg. grade
measure stores the actual course grade of the student.
At higher conceptual levels, avg. grade stores the average grade
for the given combination.
CPIT 440
(a) Draw a snowflake schema diagram for the data warehouse.
CPIT 440
Exercise 2
(b) Starting with the base cuboid [student; course; semester;
instructor], what specific OLAP operations should perform in order
to list the average grade of CS courses for each Big University
student.
CPIT 440
The specific OLAP operations to be performed are:
Roll-up on course from course id to department.
Roll-up on student from student id to university.
Dice on course, student with department=\CS" and university = \Big
University".
Drill-down on student from university to student name.
www.company.com
Exercise 3
Suppose that a data warehouse consists of the four dimensions;
date, spectator, location, and game, and the two measures, count
and charge, where charge is the fee that a spectator pays when
watching a game on a given date.
Spectators may be students, adults, or seniors, with each category
having its own charge rate.
CPIT 440
(a) Draw a star schema diagram for the data warehouse.
CPIT 440
Exercise 3
(b) Starting with the base cuboid [date; spectator; location;
game], what specific OLAP operations should perform in order to
list the total charge paid by student spectators at GM Place in
2004?
CPIT 440
Roll-up on location from location id to location name.
Roll-up on game from game id to all.