13

Click here to load reader

Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Embed Size (px)

Citation preview

Page 1: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Materialization and Cubing Algorithms

Page 2: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Cube Materialization

• Each cell of the data cube is a view consisting of an aggregation of interest.

• The values of these cells are dependent on the values of other cells in the data cube.

• Materializing some or all of these cells is a common and powerful query optimization technique.

Page 3: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Materialization contd..

• The size of the data warehouse and the complexity of queries can cause queries to take very long to complete.

• Materializing (precompute) frequently asked queries is a commonly used technique for performance improvement.

Page 4: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Issues in View Materialization

• What views should we materialize, and what indexes should we build on the precomputed results?

• Given a query and a set of materialized views, can we use the materialized views to answer the query?

• How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)

Page 5: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Bottom up Cubing (BUC)

• BUC is an algorithm for cube construction which proceeds from the apex to base cuboid(more specific). This notion is hence called the bottom up approach.

• BUC can use the Apriori pruning property to compute icberg cubes while applying the algorithm which will be clear in the next slide.

Page 6: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

BUC algorithm

• It is a recursive algorithm which divides dimensions into partitions and facilitates iceberg pruning.

• It does not allow simultaneous aggregation and the best feature of BUC is the sharing of partitioning costs.

Page 7: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Bottom-Up Data Cube Computation example

1985 1986 1987 1988

Norway 10 30 20 24

… 23 45 14 32

USA 14 32 42 11

1985 1986 1987 1988

All 47 107 76 67

All

Norway 84

… 114

USA 99

All

All 297

Cell Values: Numbers of loan applications

Page 8: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Introduction to MOLAP cube

• Computing multiple related group-bys and aggregates is one of the core operations of On-Line Analytical Processing (OLAP) applications.

• Although is designed for MOLAP systems it can also be used for Relational OLAP (ROLAP) systems when table data is converted to an array, cubed as if in a MOLAP system, and then converted back to a table.

Page 9: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Array Storage

• There are three major issues relating to the storage of the array that must be resolved

– It is likely in a multidimensional application that the array is too large to fit in memory

– It is likely that many of the cells in the array are empty, because there is no data for that combination of coordinates

– In many cases an array will need to be loaded from data that is not in array format (e.g., from a relational table or from an external load file)

Page 10: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Resolving Storage Issues• A large n-dimensional array that can not fit into memory is

divided into small size n-dimensional (corresponding to disk blocking size) chunks and each chunk is stored as one object on disk

• Sparse chunks (with data density less than 40%) use a “chunk-offset compression” where for each valid array entry a pair, (offsetInChunk, data), is stored

• To load data from formats other than arrays, a partition-based loading algorithm is used that takes as input the table, each dimension size and a predefined chunk size, and returns a (possibly compressed) chunked array

Page 11: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Basic Array Cubing Algorithm

1. Construct the minimum size spanning tree for the group-bys of the Cube

2. Compute any group-by Di1Di2 . . . Dik of a Cube from the “parent” Di1Di2 . . . Dik+1 which has the minimum size

3. Read in each chunk of Di1Di2 . . . Dik+1 along the dimension Dik+1 and aggregate each chunk to a chunk of Di1Di2 . . . Dik

4. Once the chunk of Di1Di2 . . . Dik is complete, we output the chunk to disk and use the memory for for the next chuck of Di1Di2 . . . Dik, keeping only one chunk in memory at a time

Page 12: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values
Page 13: Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values

Inferences

• Multi-Way Array Algorithm overlaps the computation of different group-bys, while using minimal memory for each group-by.

• Thus, the Algorithm is valuable in both ROLAP and MOLAP systems