View
214
Download
0
Category
Preview:
Citation preview
Materialization and Cubing Algorithms
Cube Materialization
• Each cell of the data cube is a view consisting of an aggregation of interest.
• The values of these cells are dependent on the values of other cells in the data cube.
• Materializing some or all of these cells is a common and powerful query optimization technique.
Materialization contd..
• The size of the data warehouse and the complexity of queries can cause queries to take very long to complete.
• Materializing (precompute) frequently asked queries is a commonly used technique for performance improvement.
Issues in View Materialization
• What views should we materialize, and what indexes should we build on the precomputed results?
• Given a query and a set of materialized views, can we use the materialized views to answer the query?
• How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)
Bottom up Cubing (BUC)
• BUC is an algorithm for cube construction which proceeds from the apex to base cuboid(more specific). This notion is hence called the bottom up approach.
• BUC can use the Apriori pruning property to compute icberg cubes while applying the algorithm which will be clear in the next slide.
BUC algorithm
• It is a recursive algorithm which divides dimensions into partitions and facilitates iceberg pruning.
• It does not allow simultaneous aggregation and the best feature of BUC is the sharing of partitioning costs.
Bottom-Up Data Cube Computation example
1985 1986 1987 1988
Norway 10 30 20 24
… 23 45 14 32
USA 14 32 42 11
1985 1986 1987 1988
All 47 107 76 67
All
Norway 84
… 114
USA 99
All
All 297
Cell Values: Numbers of loan applications
Introduction to MOLAP cube
• Computing multiple related group-bys and aggregates is one of the core operations of On-Line Analytical Processing (OLAP) applications.
• Although is designed for MOLAP systems it can also be used for Relational OLAP (ROLAP) systems when table data is converted to an array, cubed as if in a MOLAP system, and then converted back to a table.
Array Storage
• There are three major issues relating to the storage of the array that must be resolved
– It is likely in a multidimensional application that the array is too large to fit in memory
– It is likely that many of the cells in the array are empty, because there is no data for that combination of coordinates
– In many cases an array will need to be loaded from data that is not in array format (e.g., from a relational table or from an external load file)
Resolving Storage Issues• A large n-dimensional array that can not fit into memory is
divided into small size n-dimensional (corresponding to disk blocking size) chunks and each chunk is stored as one object on disk
• Sparse chunks (with data density less than 40%) use a “chunk-offset compression” where for each valid array entry a pair, (offsetInChunk, data), is stored
• To load data from formats other than arrays, a partition-based loading algorithm is used that takes as input the table, each dimension size and a predefined chunk size, and returns a (possibly compressed) chunked array
Basic Array Cubing Algorithm
1. Construct the minimum size spanning tree for the group-bys of the Cube
2. Compute any group-by Di1Di2 . . . Dik of a Cube from the “parent” Di1Di2 . . . Dik+1 which has the minimum size
3. Read in each chunk of Di1Di2 . . . Dik+1 along the dimension Dik+1 and aggregate each chunk to a chunk of Di1Di2 . . . Dik
4. Once the chunk of Di1Di2 . . . Dik is complete, we output the chunk to disk and use the memory for for the next chuck of Di1Di2 . . . Dik, keeping only one chunk in memory at a time
Inferences
• Multi-Way Array Algorithm overlaps the computation of different group-bys, while using minimal memory for each group-by.
• Thus, the Algorithm is valuable in both ROLAP and MOLAP systems
Recommended