5
Overview - Teradata Secondary Indexes RETAIL & CPG PreetamPadhy [email protected]

Teradata Secondary Indexes.pdf

Embed Size (px)

Citation preview

  • Overview - Teradata Secondary Indexes

    RETAIL & CPG

    PreetamPadhy [email protected]

  • Overview Teradata Secondary Indexes 2012

    TCS Confidential Page 1

    Introduction: Secondary Indexes (SIs) are a unique feature of Teradata, generally defined to provide faster set selection. The Teradata RDBMS allows up to 32 SIs per table. There are two types of secondary indexes:

    1. Unique Secondary Indexes (USIs) 2. Non-Unique Secondary Indexes (NUSIs)

    The system maintains a separate subtable for each secondary index. Subtables keep base table secondary index row hash, column values, and RowID (which point to the row(s)) in the base table with that value. Users cannot access subtables directly. Secondary indexes can be defined for a new table using CREATE TABLE or for an existing table using CREATE INDEX.

    Fig 1: Access mechanism of primary index and secondary index

    TYPES OF SECONDARY INDEXES:

    Unique Secondary Index (USI): USIs are always preferable to NUSIs for access using a single value. The usual criterion for choosing between them is the intended application. Data to be indexed tends to be either unique or not inherently. USIs are useful both for base table access (because USI access is, at worst, a two-AMP operation) and for enforcing data integrity by applying a uniqueness constraint on a column set. Like a unique primary index, a unique secondary index can be used to guarantee the uniqueness of each value in a column set.

    USI ACCESS: USI access is usually a two-AMP operation because a USI typically hashes to a different AMP than the PI for the same row. If the USI subtable row hashes to the same AMP as the base table Row it points to, then only one AMP is accessed. The following stages are involved in a USI base table row access. The requested USI value is accessed by hashing to its subtable. The pointer to the base table row is read and used to access the stored row directly.

  • Overview Teradata Secondary Indexes 2012

    TCS Confidential Page 2

    The Subtable ID portion of the Table ID references the USI subtable not the data table. Using the DSW for the Row Hash, the Message Passing Layer (a.k.a., Communication Layer) directs the message to the correct AMP which uses the Table ID and Row Hash as a logical index block identifier and the Row Hash and USI value as the logical index row identifier. If the AMP succeeds in locating the index row, it extracts the base table Row ID. The Subtable ID portion of the Table ID is then modified to refer to the base table and a new three-part message is put onto the Communications Layer. Once again, the Message Passing Layer uses the DSW to identify the correct AMP that AMP now uses Table ID and Row Hash to locate the correct data block and then uses Row Hash and Uniqueness Value (Row ID) to locate the correct row.

    Fig 2: USI Access

    Non Unique Secondary Index (NUSI): NUSIs are particularly useful for range access equality and non equality conditions. Highly selective NUSIs are useful for reducing the cost of frequently made selections and joins on non unique columns, and provide extremely fast access for equality conditions. However, NUSIs with low selectivity can be less efficient than a full-table scan. NUSIs are implemented on an AMP-local basis. Each AMP is responsible for maintaining only those NUSI subtable rows that correspond to base table rows located on that AMP. Since NUSIs allow duplicate index values and are based on different columns than the PI, data rows matching the supplied NUSI value could appear on any AMP. Any AMP that does not have an index row for the NUSI value will not access the base table to extract rows.

  • Overview Teradata Secondary Indexes 2012

    TCS Confidential Page 3

    NUSIs are a less preferable secondary index choice for other applications for several reasons. NUSI access is always an all-AMPs operation. Because NUSI subtable access is not hashed, the subtables must be scanned in order to locate the relevant pointers to base table rows. This is a fast lookup process when a NUSI is specified in an equality condition because the NUSI rows are hash-ordered on each AMP.

    NUSI ACCESS: NUSIs are AMP-local indexes; this message gets broadcast to all AMPs. Each AMP uses the values to search the appropriate index block for a corresponding NUSI row. Only those AMPs with one or more of the desired rows use the base table Row IDs to access the proper data blocks and data rows. By definition, there are multiple rows per value in a NUSI. If a NUSI is not correlated with the primary index of its base table, those rows are distributed among the AMPS in a way that does not favour the likelihood of the Optimizer selecting the NUSI to access them. In any case, when the number of rows per NUSI value approaches or exceeds the number of AMPs in the system, multiple AMPs must be accessed. The usefulness of a NUSI is correlated with the number of NUSI rows per value: the fewer number of NUSI rows per value, the less useful the index

    1. Single NUSI Access (Between, Less Than, or Greater Than) The Teradata RDBMS accesses data from a NUSI-defined column in two ways:

    Utilize the NUSI and do a Full Table Scan (FTS) of the NUSI subtable. In this case, the Row IDs of the qualifying base table rows would be retrieved into spool. The Teradata RDBMS would use those Row IDs in spool to access the base table rows themselves.

    If the NUSI is not value-ordered, the system may do a FTS of the NUSI subtable. If the NUSI is ordered by values, the NUSI subtable is much more likely be used

    to locate matching base table rows. Ignore the NUSI and do an FTS of the base table itself.

    2. Dual NUSI Access:

    Two NUSIs are created on separate columns of the table. The Teradata RDBMS decides how to use these NUSIs based on their selectivity.

    a) AND with Equality Conditions: If one of the two indexes is strongly selective, the system uses it alone for

    access. If both indexes are weakly selective, but together they are strongly selective, the

    system does a bit-map intersection. If both indexes are weakly selective separately and together, the system does an

    FTS.

  • Overview Teradata Secondary Indexes 2012

    TCS Confidential Page 4

    b) OR with Equality Conditions: When accessing data with two NUSI equality conditions joined by the OR, the Teradata RDBMS may do one of the following:

    Do a FTS of the base table. If each of the NUSIs is strongly selective, it may use each of the NUSIs to return the

    appropriate rows. Do an FTS of the two NUSI subtables and do the following steps.

    o Retrieve Rows IDs of qualifying base table rows into two separate spools.

    o Eliminate duplicates from the two spools of Row IDs. o Access the base rows from the resulting spool of Row IDs.

    Covering Indexes: If the query references only columns of that table that are fully contained within a given index, the index is said to "cover" the table in the query. In these cases, it is often more efficient to access only the index subtable and avoid accessing the base table rows altogether. Covering will be considered for any table in the query that references only columns defined in a given NUSI. These columns can be specified anywhere in the query including the: SELECT list WHERE clause Aggregate functions GROUP BY expressions The presence of a WHERE condition on each indexed column is not a prerequisite for using the index to cover the query. The optimizer will consider the legality and cost of covering versus other alternative access paths and choose the optimal plan. Many of the potential performance gains from index covering require no user intervention and will be transparent except for the execution plan returned by EXPLAIN.

    Secondary Index Considerations: SIs require additional storage to hold their subtables. In the case of a Fallback table, the SI subtables are Fallback also. Twice the additional storage space is required. SIs require additional I/O to maintain these subtables.

    Introduction:TYPES OF SECONDARY INDEXES:Unique Secondary Index (USI):USI ACCESS:

    Non Unique Secondary Index (NUSI):NUSI ACCESS:1. Single NUSI Access (Between, Less Than, or Greater Than)

    Covering Indexes:Secondary Index Considerations: