13
20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Embed Size (px)

Citation preview

Page 1: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

20.5Data Cubes

Instructor : Dr. T.Y. Lin

Chandrika Satyavolu

222

Page 2: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Outline

• Introduction• The Cube Operator• Aggregation• Cube implementation by Materialized Views• Materialized Views• Queries using materialized view• The Lattice of Views• Lattice of a materialized view

Page 3: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Introduction

• A formal data cube precomputes all possible aggregates in a systematic way.

• Amount of extra storage required is tolerable.

• If the warehoused data does not change, there is no penalty incurred to keep the aggregates up-to-date.

Page 4: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

The Cube Operator• Given : a fact table F, an

augmented table would be CUBE(F) and it adds a new value ‘*’, to each dimension.

• * means any. It aggregates dimension in which it appears.

• Border represents only a small addition to the volume of the CUBE.

• A tuple of CUBE(F) that has ‘*’ in one or more dimension will have sum of values of that dependent attribute in all tuples in that dimension.

Page 5: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Example of aggregation :Given fact table:– Sales (serialNo, date, dealer, price)– serialNo is a key, so aggregating over all dates, dealers or

prices will still give just one result with that serialNo– Replace serialNo by two attributes : model and color (both

non-key)– It could give total sales price for

• a given model• a given color• a given dealer• a given date

– To get average sales, add a new dimension cnt and replace price with val i.e. sum of prices over a category

– Relation Sales now becomes:Sales (model, color, date, dealer, val, cnt)

Aggregation

Page 6: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

• Example of a tuple in both Sales and CUBE(Sales) would be:(‘Gobi’, ‘red’, ‘2001-05-21’, ‘Friendly-Fred’, 45000, 2)

• Aggregating over all colors:(‘Gobi’, *, ‘2001-05-21’, ‘Friendly-Fred’, 152000, 7)

• Aggregating over all colors and dealers:(‘Gobi’, *, ‘2001-05-21’, *, 2348000, 100)

• Aggregating over all colors, times and dealers:(‘Gobi’, *, *, *, 1339800000, 58000)

• Aggregating over all colors, times, dealers and models:(*, *, *, *, 3521727000, 198000)

Aggregating over all dimensions tells us that the total sales of all models in all colors, over all time at all dealers is 198000 cars for a total price of $3521727000

Page 7: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

• Query and Tuple:– If the query specifies a value v for attribute a, then the tuple t

has v in its component for a.– If the query groups by an attribute a, then t has a non-* value

in its component for a.– If the query neither groups by an attribute a nor specifies a

value a, then t has a * value in its component for a.

• Example:– Query:

SELECT color, AVG(price) FROM Sales WHERE model =

‘Gobi’ GROUP BY color;

– Tuple:(‘Gobi’, c, *, *, v, n)

Page 8: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Cube implementation by Materialized Views

• Until now, we assumed that queries choose to aggregate completely in a dimension or not to aggregate at all.

• Some dimensions can have many degrees of granularity. Example: – days can be aggregated by weeks, months, quarters or

years

• As number of choices of grouping increases, cost of storing the results increases too.

• Answer to this is : chose some materialized views of the data cube.

Page 9: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Materialized Views• A materialized view is the result of some query, which we choose to store in

the database, rather than reconstructing it as needed in response to queries.

• Coarser the partition, the less space materialized view takes.

• To use a view to answer a query, the view must not partition any more coarser than the query does.

• Example: Group dates by months and dealers by city– INSERT INTO SalesV1 SELECT model, color, month, city, SUM(val) AS val,

SUM(cnt) AS cnt FROM Sales JOIN Dealers on dealer = name GROUP BY model, color, month, city;

• Example: Group dates by weeks and dealers by state and over all colors– INSERT INTO SalesV2 SELECT model, week, state, SUM(val) AS val, SUM(cnt)

AS cnt FROM Sales JOIN Dealers on dealer = name GROUP BY model, week, state;

Page 10: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Queries using materialized view

Q1 : SELECT model, SUM(val) FROM Sales GROUP BY model;

can be answered by:

SELECT model, SUM(val) FROM SalesV1 GROUP BY model;

or

SELECT model, SUM(val) FROM SalesV2 GROUP BY model;

Q2 : SELECT model, year, state, SUM(val) FROM Sales JOIN Dealers ON

dealer = name GROUP BY model, year, state;

can only be answered by:

SELECT model, year, state, SUM(val) FROM SalesV1 GROUP BY model,

year, state;

Q3 : SELECT model, color, date, SUM(val) FROM Sales GROUP BY model,

color, date;

can be answered by neither SalesV1 nor SalesV2.

Page 11: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

The Lattice of Views

• Lattice arranges groupings of similar dimensions in a hierarchical order.

• Points of lattice are the ways we can partition a dimension.• If partition P1 is below partition P2 (P1≤P2), if each group of P1 is

contained within some group of P2.• Examples of lattice:

Lattice of partitions on time interval Lattice of partitions on dealers

Page 12: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Lattice of a materialized view

• If V1 and V2 are two different views, then V1≤ V2 means in each dimension, the partition P1 from view V1 is at least as fine as partition P2 from view V2; that is P1≤P2.

• General rule:We can answer a query Q using view V if and only if V ≤ Q.

Example of view lattice:

Page 13: 20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

Thank You.