11
Indexing OLAP Data Indexing OLAP Data Sunita Sarawagi Sunita Sarawagi Monowar Hossain York University

Indexing OLAP Data Sunita Sarawagi

Embed Size (px)

DESCRIPTION

Indexing OLAP Data Sunita Sarawagi. Monowar Hossain York University. Agenda. Requirements on Indexing methods Existing indexing methods Optimization of R-Tree for OLAP data R-Tree VS Bit-mapped Indices Conclusion. Requirements on Indexing methods. Symmetric partial match queries - PowerPoint PPT Presentation

Citation preview

Indexing OLAP DataIndexing OLAP DataSunita SarawagiSunita Sarawagi

Monowar Hossain

York University

AgendaAgendaRequirements on Indexing methodsExisting indexing methodsOptimization of R-Tree for OLAP dataR-Tree VS Bit-mapped IndicesConclusion

Requirements on Indexing methodsRequirements on Indexing methodsSymmetric partial match queries

– Continuous e.g. “time between Jan to July 94”– Discontinuous e.g. “first month of each year”

Indexing at multiple levels of aggregation– Pre-computation group-bys – Indexing summary data

Handing multiple traversal ordersEfficient batch updateHandling sparse data efficiently

Existing methodsExisting methods Multidimensional array-based methods

– Works efficiently when data is dense– Essbase’s schema

E.G. four dimensional cube : product and store (sparse), time and scenarios ( dense)

– B-tree on Product and Store– Two-dimensional array on time and scenarios

– Evaluation of Essbase’s schema May cause multiple searches.

– E.g. searching store = “something” on product-store index Performance depends on ability to find enough dense

dimensions. Efficient batch update

Existing methods… Cont...Existing methods… Cont... Bit mapped indices

– Pros: Low cardinality data, bit maps are both spaced and retrieval

efficient. Supports bitwise operations Access data is clustered All dimensions handles symmetrically

– Cons Range queries Increased space overhead of storing the bit-maps specially for

high cardinality data Expensive batch update as all bit mapped indices have to be

modified even for a single row insertion

Existing methods... Cont…Existing methods... Cont…Bit-mapped indices variants

– Compression– Hybrid– Dynamic Bit-maps

Existing methods... Cont…Existing methods... Cont… Hierarchical Indices

– Example: Product - Store Index product first also store summaries on product level. For each product value, create index for Store and store

summaries for product-store level

– Pros: Allows faster access to higher levels data Dimensions are symmetrically handled

– Cons: Widely used index storage overhead The average retrieval efficiency can suffer because of large

indexing structure

Existing methods… Cont…Existing methods… Cont…Multidimensional indices

– Use of of the indexed methods designed for spatial data

E.g RTree, GridFiles etc.

Optimized R-Tree of OLAP dataOptimized R-Tree of OLAP data Rectangular dense region (only the boundaries that

contain more than threshold number of points– Contains a pointer to variable length array of (TIDs or

the tuples itself)– Points in sparse regions

Finding dense regions– Ask Expert?– Use of clustering algorithm (similar algorithm: image

analysis)

Need evaluation!!

R-Tree VS Bit-mapped indicesR-Tree VS Bit-mapped indicesR-Tree Pros:

– Allows range queries– Smaller space overhead– Update is more efficient

Bit-mapped Pros:– Faster Bit-wise operation– Efficient for low cardinality, few restricted

dimensions, and sparse data.

ConclusionConclusionHigh level overviewRecommended readings

– MOLAP VS OLAP – R-Tree and variants– R-Tree alternatives – Computational of multidimensional aggregates– And More…..