35
Variant Indexes

Variant Indexes

  • Upload
    edmund

  • View
    68

  • Download
    4

Embed Size (px)

DESCRIPTION

Variant Indexes. Specialized Indexes?. Data warehouses are large databases with data integrated from many independent sources. Queries are often complex. Updated in a batch fashion. Goal : develop specialized indexes to speed up query evaluation. Bitmap Indexes. - PowerPoint PPT Presentation

Citation preview

Page 1: Variant Indexes

Variant Indexes

Page 2: Variant Indexes

Specialized Indexes?

• Data warehouses are large databases with data integrated from many independent sources.

• Queries are often complex.

• Updated in a batch fashion.

• Goal: develop specialized indexes to speed up query evaluation.

Page 3: Variant Indexes

Bitmap Indexes

• Value-list index – same row may appear under many key values in index

• RID-list – list a key value once at head, which is followed by a long list of RIDs for rows with this key value.

• Bitmap – alternate form for representing RID-lists in a Value-list index

• Bitmaps are more space efficient than RID-lists when the number of key values is low (#1)

Page 4: Variant Indexes

Creating Bitmap Index

• For n rows of T = {r1,…,rn}, do 1-1 mapping from T to Z[M}, the 1st M positive integers.

• For a given row r with row # j, the table page number accessed to retrieve row r is j/p and the page slot is j%p.

Page 5: Variant Indexes

• A Bitmap B is a sequence of M bits. Each row r with row # j that has property P, we set bit j in B to 1, all other bits are zero.

• A Bitmap index for a column C with values v1,…,vk is a B+ tree with entries having these key values and associated data portions that contain Bitmaps for the properties C=v1,…,C=vk.

Page 6: Variant Indexes

• Dense Bitmaps – if the proportion of 1-bits in the Bitmap is large.

• A Bitmap index for a column with 32 values will have average density 1/32. Space needed for Bitmap is comparable to disk space needed for RID-list index, which requires 32 bits for each RID present.

• For a column index with a very small # of values, the Bitmaps will have high densities and disk savings is enormous.

Page 7: Variant Indexes

• Boolean operations are extremely fast.

• Can use a bunch of for loops to execute. For long Bitmaps, take fragments out from disk. (#2)

• Loops for Bitmap AND, OR, NOT, or COUNT are extremely fast compared to loop operations on RID lists. (#3)

Page 8: Variant Indexes

Segmentation

• To optimize Bitmap index access, can be broken into fragments of equal sizes

• RIDs can be small # bytes in length, they specify position in segment. At beginning, high order bits can be specified. (#4)

• If one of predicate indexes has no pointer to a Bitmap Fragment for a segment, then the segment fragments for the other indexes can be ignored as well, can save a lot of I/O.

Page 9: Variant Indexes

• Bitmap representations and RID-list representations are interchangeable, when Bitmap representations are dense, Bitmaps are much more efficient than RID-lists, for storage use and Boolean operations. When too sparse, Bitmap index can contain RID-list.

Page 10: Variant Indexes

Projection Indexes

• Say C is a column of a table T; then the Projection index on C consists of a stored sequence of column values from C, in order by row number in T from which the values are extracted.

Page 11: Variant Indexes

• Projection index very efficient when column values must be retrieved from all rows of a found set.

• For example, say column values are 4 bytes in length, 1000 values will fit on a 4KB page. Say each row is 200 bytes, then using index requires 1/20 I/0s of getting whole rows.

Page 12: Variant Indexes

• Like vertical partitioning, good when small # of columns are retrieved by most SELECT statements, bad when most queries retrieve most of the columns. (#5)

Page 13: Variant Indexes

Bit-Sliced Indexes

• A Bit-Sliced index stores a set of “Bitmap slices” which are orthogonal to the data held in a Projection index. Good for calculating aggregates in a found set.

• Each individual Bitmap Bi is called a Bit-slice of the column. There is one Bitmap for Bnn for “not null” and one Bitmap for Bn for “null” (#6)

Page 14: Variant Indexes

Comparing Indexes for Aggregate Evaluation

• SELECT SUM (dollar_sales)

FROM SALES

WHERE condition

Page 15: Variant Indexes

• Say found set Bf (2 million rows) is already determined.

• SALES table has 100 million rows, which are each 200 bytes in length, stored 20 rows on a 4KB disk page.

Page 16: Variant Indexes

Query Plan 1

• Direct access to rows to calculate SUM

• Each disk page contains 20 rows, have 5,000,000 disk pages occupied by SALES table (100 millon rows / 20 rows). Since Bf represents only 1/50 rows of SALES table, # disk pages occupies is: 5,000,000(1 – e^-2,000,000/5,000,000) = 1,648,4000.

Page 17: Variant Indexes

• This means that, if one disk arm retrieves 100 disk pages per second, you will need 4 hours of disk arm use! (1,648,400/100 = 16,484 seconds = 4 hours)

Page 18: Variant Indexes

Query Plan 2

• Get SUM with Projection Index

• The dollar_sales Projection index has 1000 values per 4KB disk page. Requires 100,000 disk pages. This equals 1000 seconds (17 minutes)

Page 19: Variant Indexes

Query Plan 3

• Get SUM with Value-list index on dollar_sales.• Can calculate SUM(dollar_sales) for our Foundset

by ranging through all possible values in the index and determining the rows with each value, the determining how many rows with each value are in the Foundset, and finally multiplying that count by the value and adding to the SUM.

Page 20: Variant Indexes

• IF (COUNT (Bf AND Bnm) == 0)

Return null

SUM = 0.00

For each non-null value v in index for C {

Designate set of rows with value v as Bv

SUM += v * COUNT(Bf AND Bv);

}

return SUM;

Page 21: Variant Indexes

• Good if entire Bitmaps are in memory.

• Value-list index evaluation of SUM ( C ) requires 10,000 Bitmap ANDs and 10,000 COUNTS.

• Requires 103,125 I/Os and a loop of length 100,000,000.

• Requires 1,340,640 disk pages

Page 22: Variant Indexes

Query Plan 4• Get SUM with Bit-Sliced index• IF (COUNT (Bf AND Bnn) == 0)

Return null

SUM = 0.00

For I = 0 to N

SUM += 2 (i power) * COUNT (Bi AND Bf);

return SUM

Page 23: Variant Indexes

• Perform 21 ANDs and 21 COUNTs of 100,000,000 bit Bitmaps. Each Bitmap is 12.5 Mbytes in length, requiring 3125 I/Os, but we assume that Bf can remain in memory. Thus need to read 22 Bitmaps from disk – 22*3125 = 68,750 I/Os. For CPU, need to AND 21 pairs of Bitmaps, total # of loop passes: 21*(100,000,000/32) = 65,625, 000. Need to perform 21 COUNTs, with 131,250,000 passes.

Page 24: Variant Indexes

Comparing Algorithm Performance

METHOD I/0 CPU contributionsAdd from rows 1,342 K I/O + 2M*(25 ins)Projection index 100 K I/O + 2M*(10 ins)Value-List index 103 K I/O + 100M*(10 ins)Bit-Sliced index 69 K I/O + 197M*(1 ins)

Page 25: Variant Indexes

Aggregate Functions & Index Types for Evaluation

Aggregate Value-List Projection Bit-SlicedCOUNT Not needed Not needed Not neededSUM Not Bad Good BestAVG (SUM/COUNT) Not Bad Good BestMAX & MIN Best Slow SlowMEDIAN, N-TILEBest No SometimesColumn-Product Slow Best Slow

Page 26: Variant Indexes

Range Predicates

• Consider:SELECT target-list FROM TWHERE C-range AND <condition>;

Want to further restrict Foundset Bf so that the compound predicate “C-range AND <condition>” holds. This will be represented by BF.

Page 27: Variant Indexes

Using Projection Index

• Can create BF by accessing each C value in the index corresponding to a row number in BF and testing whether it lies within the specified range.

Page 28: Variant Indexes

Value-List Index

• Br = the empty set

For each entry v in the Index for C that satisfies the range specified

Designate the set of rows with the value v as Bv

Br = Br OR Bv

BF = Bf AND Br

(Br needs to be in memory all times for efficiency)

Page 29: Variant Indexes

Bit-Sliced Index

• The algorithm is efficient, it comprises a bunch of Boolean operations for each bit-slice of a column C.

Page 30: Variant Indexes

Range Evaluation Performance by Index Type

Range Evaluation Value-List Projection Bit-Sliced

Narrow Range Best Good Good

Wide Range Not Bad Good Best

Page 31: Variant Indexes

Join Indexes

• A Join index is an index on one table for a quantity that involves a column value of a different table through a commonly encountered join.

• Join indexes can be used to avoid actual joins of tables, or to greatly reduce the volume of data that must be joined, by performing restrictions in advance.

Page 32: Variant Indexes

Calculating Groupset Aggregates

• Once the Foundset has been computed from the Where clause, the bits in Foundset must be partitioned into Groupsets. Any aggregate functions are then evaluated separately over these different Groupsets.

• How to compute Groupset aggregates using our different index types.

Page 33: Variant Indexes

Projection Indexes

• Assume Projection indexes exist for each of the group-by columns.

• For each Foundset, read appropriate Projection indexes. Read values of the columns to be aggregated from index, and aggregate the result into the proper cell of the memory-resident array.

Page 34: Variant Indexes

Value-List Indexes

• Grouping by columns D1.A, D2.B using Value-List index:

For each entry v1 in the Value-List index for D1.A

For each entry v2 in the Value-List index for D2.B

Bg = Bv1 AND Bv2 AND Bf

Evaluate AGG( F.C ) on Bg

Page 35: Variant Indexes

Conclusion

• For read-only systems, we could use more complex index structures to speed up queries. The Bitmap, Projection, and Bit-Slice indexes serve this purpose.