Upload
francis-alvarez
View
29
Download
0
Embed Size (px)
DESCRIPTION
Indexing OLAP Data Sunita Sarawagi. Monowar Hossain York University. Agenda. Requirements on Indexing methods Existing indexing methods Optimization of R-Tree for OLAP data R-Tree VS Bit-mapped Indices Conclusion. Requirements on Indexing methods. Symmetric partial match queries - PowerPoint PPT Presentation
Citation preview
AgendaAgendaRequirements on Indexing methodsExisting indexing methodsOptimization of R-Tree for OLAP dataR-Tree VS Bit-mapped IndicesConclusion
Requirements on Indexing methodsRequirements on Indexing methodsSymmetric partial match queries
– Continuous e.g. “time between Jan to July 94”– Discontinuous e.g. “first month of each year”
Indexing at multiple levels of aggregation– Pre-computation group-bys – Indexing summary data
Handing multiple traversal ordersEfficient batch updateHandling sparse data efficiently
Existing methodsExisting methods Multidimensional array-based methods
– Works efficiently when data is dense– Essbase’s schema
E.G. four dimensional cube : product and store (sparse), time and scenarios ( dense)
– B-tree on Product and Store– Two-dimensional array on time and scenarios
– Evaluation of Essbase’s schema May cause multiple searches.
– E.g. searching store = “something” on product-store index Performance depends on ability to find enough dense
dimensions. Efficient batch update
Existing methods… Cont...Existing methods… Cont... Bit mapped indices
– Pros: Low cardinality data, bit maps are both spaced and retrieval
efficient. Supports bitwise operations Access data is clustered All dimensions handles symmetrically
– Cons Range queries Increased space overhead of storing the bit-maps specially for
high cardinality data Expensive batch update as all bit mapped indices have to be
modified even for a single row insertion
Existing methods... Cont…Existing methods... Cont…Bit-mapped indices variants
– Compression– Hybrid– Dynamic Bit-maps
Existing methods... Cont…Existing methods... Cont… Hierarchical Indices
– Example: Product - Store Index product first also store summaries on product level. For each product value, create index for Store and store
summaries for product-store level
– Pros: Allows faster access to higher levels data Dimensions are symmetrically handled
– Cons: Widely used index storage overhead The average retrieval efficiency can suffer because of large
indexing structure
Existing methods… Cont…Existing methods… Cont…Multidimensional indices
– Use of of the indexed methods designed for spatial data
E.g RTree, GridFiles etc.
Optimized R-Tree of OLAP dataOptimized R-Tree of OLAP data Rectangular dense region (only the boundaries that
contain more than threshold number of points– Contains a pointer to variable length array of (TIDs or
the tuples itself)– Points in sparse regions
Finding dense regions– Ask Expert?– Use of clustering algorithm (similar algorithm: image
analysis)
Need evaluation!!
R-Tree VS Bit-mapped indicesR-Tree VS Bit-mapped indicesR-Tree Pros:
– Allows range queries– Smaller space overhead– Update is more efficient
Bit-mapped Pros:– Faster Bit-wise operation– Efficient for low cardinality, few restricted
dimensions, and sparse data.