99
Last Updated : 21 st Dec 2004 Center of Excellence Data Warehousing Group Advanced Teradata Concepts

Advanced Teradata Concepts

Embed Size (px)

DESCRIPTION

imp

Citation preview

Page 1: Advanced Teradata Concepts

Last Updated : 21st Dec 2004

Center of ExcellenceData Warehousing

Group

Advanced Teradata Concepts

Page 2: Advanced Teradata Concepts

TopicsTopics

Primary and Secondary Indexes

Join Processing

Join Indexes

Hash Indexes

Partitioned Primary Indexes

Collect Statistics

Priority Scheduler

Teradata Dual Active Server

Page 3: Advanced Teradata Concepts

Primary and Secondary Indexes

Page 4: Advanced Teradata Concepts

IndexesIndexes

Teradata provides numerous indexing options that can improve query performance for different types of queries and workloads. Following kinds of indexes are available:

Primary IndexSecondary Indexes Join IndexesHash IndexesPartitioned Primary Indexes.

Page 5: Advanced Teradata Concepts

Primary IndexesPrimary Indexes

In Teradata, Primary Index is a mechanism to assign and store a data row in an AMP.

Since primary index is used to store data rows, retrieving data using primary index is very efficient.

Primary Index can be Unique or Non-Unique.

Choosing Primary Index is critical as it affects the data distribution across the processing units (AMPs) and hence affects the performance.

Page 6: Advanced Teradata Concepts

Primary Index Choice CriteriaPrimary Index Choice Criteria

Access Demographics – Choose the column most frequently used for access to maximize the number of one AMP operations.

Distribution Demographics – Better distribution optimizes parallel processing.

Volatility – Changing PI may cause the row itself to be moved to another AMP. Stable PI reduces data movement overhead.

Page 7: Advanced Teradata Concepts

UPI and NUPIUPI and NUPI

UPI Best distribution due to unique value. One AMP operation and uses only one I/O. Best performance.

NUPI Good distribution for ‘near unique’ values. Duplicate PI rows go to same block. No extra

I/O if all duplicate rows fit in single block. Duplicate row check required if there is no

USI defined. Multiple I/Os required if rows do not fit in a

single data block.

Page 8: Advanced Teradata Concepts

UPI and NUPI (cont.)UPI and NUPI (cont.)

Highly non-unique values cause skewed distribution.

Highly non-unique values cause extra overhead in duplicate row check.

Multi-Column PI gives better distribution.

But as the number of column increases the index becomes less usable.

Partial values can not be used for PI access.

Do not include a column for index selection that does not improve the selectivity of the index.

Page 9: Advanced Teradata Concepts

Secondary IndexesSecondary Indexes

Secondary Index values are stored in sub tables.May be unique or non unique.Teradata implements USI and NUSI differently.

SI Value BT Row ID

SI value BT Row ID

Sec. Index value Hash Algorithm

Base Table

Index Subtable

Page 10: Advanced Teradata Concepts

Secondary IndexesSecondary Indexes

USI are hash distributed across all AMPs.Sub table rows may reside in a AMP other than

the base table row.USI access involved two-AMP operation.

NUSI are implemented on a AMP local basis.Sub table rows located in the same AMP of

base table rows.NUSI access involved all-AMP operation.

Page 11: Advanced Teradata Concepts

Secondary Index ConsiderationsSecondary Index Considerations

Need additional storage to hold sub-table.

Need additional I/O.

Choose columns for NUSI candidate only those having frequent access.

If “COLLECTed STATISTICS” are not available Teradata may not choose NUSI as the access path.

Use EXPLAIN facility to see the plan chosen by the optimizer.

Page 12: Advanced Teradata Concepts

NUSI Bit MappingNUSI Bit Mapping

Used when multiple NUSI are being used will AND condition.

Identifies common Row Ids in the satisfied by the query before retrieving the base table rows.

SI Value Row Id Indx1

Indx2

Multiple-column secondary indexes are less usable. Define multiple secondary indexes to allow bit mapping.

Page 13: Advanced Teradata Concepts

Row Access MethodsRow Access Methods

PI Value

Hashing Algorithm

Base Table Base Table Base Table Base Table

Sub Table Sub Table Sub Table Sub Table

NUSI Value

Hashing Algorithm

USI Value

Hashing Algorithm

Value

Hashing Algorithm

PI/NUPI USI NUSI FTS

Page 14: Advanced Teradata Concepts

Value Ordered NUSIValue Ordered NUSI

NUSI sub-tables are local to the AMP corresponding to its base table and, by default, are sorted in row hash of the secondary index column.

Value Ordered NUSI sub-tables are sorted by secondary index column-value rather than its row hash.

Value Ordered NUSI are efficient for processing queries with range conditions and inequality conditions on the secondary index column.

Page 15: Advanced Teradata Concepts

Value Ordered NUSI (Cont.)Value Ordered NUSI (Cont.)

SELECT * FROM LineItem WHERE L_Shipdate < DATE '1992-02-28';

CREATE INDEX Indx_Shipdate(L_Shipdate) on LineItem;

…… 3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an all-rows scan with a condition of ("TPCH.lineitem.L_SHIPDATE < DATE '1992-02-28'") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with high confidence to be 1,764 rows. The estimated time for this step is 2.58 seconds.……

Optimizer did not choose Secondary Index. Why ?

Page 16: Advanced Teradata Concepts

Value Ordered NUSI (Cont.)Value Ordered NUSI (Cont.)

SELECT * FROM Lineitem WHERE L_Shipdate < DATE '1992-02-28';

CREATE INDEX Indx_Shipdate(L_Shipdate) ORDER BY VALUES ON Lineitem;

…… 3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of a traversal of index # 4 with a range constraint of ( "TPCH.lineitem.Field_1035 < DATE '1992-02-28'") extracting row ids only into Spool 2 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 2 by row id eliminating duplicate rows. The size of Spool 2 is estimated to be 1,764 rows. The estimated time for this step is 0.04 seconds. 4) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of row ids from Spool 2 (Last Use) with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with high confidence to be 1,764 rows. The estimated time for this step is 2.58 seconds.……

Page 17: Advanced Teradata Concepts

Join Processing

Page 18: Advanced Teradata Concepts

Join ProcessingJoin Processing

Each AMP performs join processing in parallel.

Optimizer chooses best join strategy based on Available indexes, andData Demographics (Collect

Statistics/Dynamic Sampling)

Rows must be on the same AMP for matching.

Teradata temporarily moves the rows to same AMP if they are not in the same AMP for join. This is called row redistribution.

Page 19: Advanced Teradata Concepts

Join ProcessingJoin Processing

General Join Scenarios:

Join column is the PI of both the tables.

Join column is PI of one of the tables.

Join column is not a PI of either of the table.

Page 20: Advanced Teradata Concepts

Case 1- PI of both the tablesCase 1- PI of both the tables

Rows taking part in the join are already in the same AMP.

No data movement is necessary.

This is the best-case scenario.

Page 21: Advanced Teradata Concepts

Case 1 - ExampleCase 1 - Example

CREATE SET TABLE EMPLOYEE ( EmpNo SMALLINT Name VARCHAR(12), DeptNo SMALLINT, JobTitle VARCHAR(12), Salary DECIMAL(8,2) DOB DATE, )UNIQUE PRIMARY INDEX ( EmpNo )

CREATE SET TABLE LOCATION ( EmpNo INTEGER, Loc VARCHAR(25) )PRIMARY INDEX ( EmpNo );

SELECT E.EmpNo, E.Name, L.Loc FROM Employee E, Location L WHERE E.EmpNo = L.EmpNo;

Page 22: Advanced Teradata Concepts

Case 1 - Explain Output Case 1 - Explain Output

1) First, we lock a distinct PERSONNEL."pseudo table" for read on a RowHash to prevent global deadlock for PERSONNEL.LOCATION. 2) Next, we lock a distinct PERSONNEL."pseudo table" for read on a RowHash to prevent global deadlock for PERSONNEL.EMPLOYEE. 3) We lock PERSONNEL.LOCATION for read, and we lock PERSONNEL.EMPLOYEE for read. 4) We do an all-AMPs JOIN step from PERSONNEL.EMPLOYEE by way of a RowHash match scan with no residual conditions, which is joined to PERSONNEL.LOCATION. PERSONNEL.EMPLOYEE and PERSONNEL.LOCATION are joined using a merge join, with a join condition of ( "PERSONNEL.EMPLOYEE.EmpNo = PERSONNEL.LOCATION.EmpNo"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 24 rows. The estimated time for this step is 0.04 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.04 seconds.

Page 23: Advanced Teradata Concepts

Case 2 - PI of one of the tablesCase 2 - PI of one of the tables

One table has its rows on the target AMP.

Rows of the other table need to be redistributed to their target AMPs by the hash code of the join column value.

If the table is small, optimizer may choose to duplicate the table on all AMPs

Page 24: Advanced Teradata Concepts

Case 2 – Example Case 2 – Example

CREATE SET TABLE EMPLOYEE ( EmpNo SMALLINT Name VARCHAR(12), DeptNo SMALLINT, JobTitle VARCHAR(12), Salary DECIMAL(8,2) DOB DATE, )UNIQUE PRIMARY INDEX ( EmpNo )

CREATE SET TABLE Department ( DeptNo SMALLINT, DeptName VARCHAR(14), Loc CHAR(3), MgrNo SMALLINT )UNIQUE PRIMARY INDEX ( DeptNo );

SELECT E.EmpNo, E.Name, D.DeptName FROM Employee E, Department D WHERE E.Deptno = D.DeptNo;

Page 25: Advanced Teradata Concepts

Case 2 – Explain OutputCase 2 – Explain Output

……4) We do an all-AMPs RETRIEVE step from PERSONNEL.EMPLOYEE by way of an all-rows scan with a condition of ("(PERSONNEL.EMPLOYEE.DeptNo >= 100) AND ((PERSONNEL.EMPLOYEE.DeptNo <= 900) AND (NOT (PERSONNEL.EMPLOYEE.DeptNo IS NULL )))") into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with no confidence to be 5 rows. The estimated time for this step is 0.03 seconds.5) We do an all-AMPs JOIN step from PERSONNEL.Department by way of a RowHash match scan with no residual conditions, which is joined to Spool 2 (Last Use). PERSONNEL.Department and Spool 2 are joined using a merge join, with a join condition of ("DeptNo = PERSONNEL.Department.DeptNo"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 5 rows. The estimated time for this step is 0.04 seconds.6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.08 seconds.

Page 26: Advanced Teradata Concepts

Case 3 - not a PI of either of the tableCase 3 - not a PI of either of the table

Rows of both the tables need to be redistributed to their target AMPs by the hash code of the join column value.

Optimizer might choose to duplicate the smaller table on all AMPs.

This join scenario involves maximum number of data movement.

Page 27: Advanced Teradata Concepts

Case 3 - ExampleCase 3 - Example

CREATE SET TABLE Lineitem ( L_ORDERKEY INTEGER NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, )PRIMARY INDEX ( L_ORDERKEY );

CREATE SET TABLE Partsupp ( PS_PARTKEY INTEGER NOT NULL, PS_SUPPKEY INTEGER NOT NULL, PS_AVAILQTY INTEGER NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) )PRIMARY INDEX ( PS_PARTKEY );

SELECT L_Suppkey, L_Quantity FROM Lineitem, PartsuppWHERE L_Suppkey = Ps_Suppkey;

Page 28: Advanced Teradata Concepts

Case 3 – Explain OutputCase 3 – Explain Output

……4) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from TPCH.partsupp by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with low confidence to be 31,938 rows. The estimated time for this step is 0.79 seconds. 2) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 3 by row hash. The result spool file will not be cached in memory. The size of Spool 3 is estimated with low confidence to be 240,480 rows. The estimated time for this step is 6.27 seconds.

Page 29: Advanced Teradata Concepts

Case 3 – Explain Output Cont…Case 3 – Explain Output Cont…

5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to Spool 3 (Last Use). Spool 2 and Spool 3 are joined using a merge join, with a join condition of ("L_SUPPKEY = PS_SUPPKEY"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 15,661,999 rows. The estimated time for this step is 1 minute and 1 second. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 1 minute and 7 seconds.

Page 30: Advanced Teradata Concepts

Join StrategiesJoin Strategies

Nested Join

Merge Join

Product Join

Page 31: Advanced Teradata Concepts

Nested JoinNested Join

Optimizer choose this join strategy when

SELECT ...FROM Table_1, Table_2WHERE Table_1.Col1 = Table_2.<Any Index>AND Table_1.<Unique Index> = <value>;

Example:

SELECT E.Name, D.DeptName FROM Employee E, Department DWHERE E.DeptNo = D.DeptNo AND E.Name = 'Sandy M';

Page 32: Advanced Teradata Concepts

Nested Join Explain OutputNested Join Explain Output

1) First, we do a two-AMP JOIN step from PERSONNEL.E by way of unique index # 4 "PERSONNEL.E.Name = 'Sandy M'" with a residual condition of ("(PERSONNEL.E.DeptNo >= 100) AND ((PERSONNEL.E.DeptNo <= 900) AND (NOT (PERSONNEL.E.DeptNo IS NULL )))"), which is joined to PERSONNEL.D by way of the unique primary index "PERSONNEL.D.DeptNo = PERSONNEL.E.DeptNo". PERSONNEL.E and PERSONNEL.D are joined using a nested join, with a join condition of ("(1=1)"). The result goes into Spool 1 (one-amp), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.04 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.04 seconds.

Page 33: Advanced Teradata Concepts

Merge JoinMerge Join

Commonly done when the join conditions are based on equality.

Steps Identify the smaller table. Put the qualifying rows from one or both table into

spool. Move the spool rows to the AMPs based on join column

hash (if required). Sort the spool rows by join column hash value (if

necessary). Compare those rows with matching join column hash

values.Example : Case 1, Case 2 and Case 3 as described.

Page 34: Advanced Teradata Concepts

Product JoinProduct Join

Most general for of join. A X B.

Optimizer choose product join usually in following conditionsWHERE clause is missing. Join condition is not based on equality

condition.

Steps: Identify the smaller table Duplicate it in spool on all AMPs. Join each spool row of the smaller table to

every row of the larger table.

Page 35: Advanced Teradata Concepts

Exclusion Merge JoinExclusion Merge Join

CREATE SET TABLE EMPLOYEE ( EmpNo SMALLINT Name VARCHAR(12), DeptNo SMALLINT NOT NULL, JobTitle VARCHAR(12), Salary DECIMAL(8,2) DOB DATE, )UNIQUE PRIMARY INDEX ( EmpNo )

CREATE SET TABLE Department ( DeptNo SMALLINT NOT NULL, DeptName VARCHAR(14), Loc CHAR(3), MgrNo SMALLINT )UNIQUE PRIMARY INDEX ( DeptNo );

SELECT EmpNo, Name, Salary FROM EmployeeWHERE DeptNo NOT IN ( SELECT DeptNo FROM Department);

Page 36: Advanced Teradata Concepts

Exclusion Merge Join Explain OutputExclusion Merge Join Explain Output

4) We do an all-AMPs RETRIEVE step from PERSONNEL.employee by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 21 rows. The estimated time for this step is 0.03 seconds. 5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to PERSONNEL.department. Spool 2 and PERSONNEL.department are joined using an exclusion merge join, with a join condition of ("DeptNo = PERSONNEL.department.DeptNo"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with index join confidence to be 21 rows. The estimated time for this step is 0.03 seconds. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.06 seconds.

Page 37: Advanced Teradata Concepts

Exclusion Merge Join ExampleExclusion Merge Join Example

1003 200

1004 300

1010 200

1013 400

1018 400

1000 300

1002 100

1006 900

1011 200

1015 200

1005 300

1009 100

1014 500

1019 100

1017 300

1001 400

1007 400

1008 500

1012 300

1016 700

300100

500

400

600

200

1005 300

1017 300

1000 300

1004 300

1012 300

1001 400

1007 400

1013 400

1018 400

1010 200

1011 200

1015 200

1003 200

1006 900

300100

500

400

600

200

1002 100

1009 100

1019 100

1014 500

1018 500

1016 700

AMP 1 AMP 2 AMP 3 AMP 4

Employee

Department

Result

Page 38: Advanced Teradata Concepts

Join Indexes

Page 39: Advanced Teradata Concepts

Join IndexesJoin Indexes

Join Index is an index structure that stores and maintains results from joining two or more tables.

Optimizer resolves the query using join index, rather than performing joins every time the query is executed.

Teradata supports a variety of Join Indexes such as: Multi-table Join IndexesSingle-table join IndexesAggregate Join Indexes

Page 40: Advanced Teradata Concepts

Join Index ExampleJoin Index Example

CREATE JOIN INDEX EmpDept AS SELECT (e.DeptNo, d.DeptName) , (E.Name, E.Salary) FROM Employee e INNER JOIN Department d ON e.DeptNo = d.DeptNo;

SELECT e.Name, d.DeptName, e.Salary FROM Employee e INNER JOIN Department d ON e.DeptNo = d.DeptNoORDER BY d.DeptName;

Does the index cover the query ?

1) First, we lock a distinct PERSONNEL."pseudo table" for read on a RowHash to prevent global deadlock for PERSONNEL.EmpDept. 2) Next, we lock PERSONNEL.EmpDept for read. 3) We do an all-AMPs RETRIEVE step from PERSONNEL.EmpDept by way of an all-rows scan with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 6 rows. The estimated time for this step is 0.03 seconds.…………

Page 41: Advanced Teradata Concepts

Join Index ExampleJoin Index Example

CREATE JOIN INDEX EmpDept AS SELECT (e.DeptNo, d.DeptName) , (E.Name, E.Salary) FROM Employee e INNER JOIN Department d ON e.DeptNo = d.DeptNo;

SELECT e.Name, d.DeptName, e.Salary, e.YrsExp FROM Employee e INNER JOIN Department d ON e.DeptNo = d.DeptNoORDER BY e.DeptName;

Does the index cover the query ?

Page 42: Advanced Teradata Concepts

Join Index ExampleJoin Index Example

CREATE JOIN INDEX EmpDept AS SELECT (e.DeptNo, d.DeptName) , (E.Name, E.Salary) FROM Employee e LEFT JOIN Department d ON e.DeptNo = d.DeptNo;

SELECT e.Name, d.DeptName, e.Salary,FROM Employee e INNER JOIN Department d ON e.DeptNo = d.DeptNoORDER BY e.DeptName;

Does the index cover the query ?

SELECT e.Name, d.DeptName, e.Salary,FROM Employee e LEFT JOIN Department d ON e.DeptNo = d.DeptNoORDER BY e.DeptName;

Does the index cover the query ?

Note: A join index with outer join covers both inner join query as well as outer join query.

Page 43: Advanced Teradata Concepts

Join Index ExampleJoin Index Example

CREATE JOIN INDEX EmpDept AS SELECT (e.EmpNo, e.DeptNo ,d.DeptName ), (e.Name ,e.Salary ) FROM Employee e JOIN Department d ON e.DeptNo = d.DeptNo

SELECT e.Name, d.DeptName, e.Salary , c.Hours FROM Employee e, Department d, Charges c WHERE e.DeptNo = d.DeptNo AND e.EmpNo = c.EmpNo;

Does the index cover the query ?

4) We do an all-AMPs RETRIEVE step from PERSONNEL.c by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with low confidence to be 16 rows. The estimated time for this step is 0.03 seconds.5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to PERSONNEL.EMPDEPT. Spool 2 and PERSONNEL.EMPDEPT are joined using a merge join, with a join condition of ("PERSONNEL.EMPDEPT.EmpNo = EmpNo"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with index join confidence to be 32 rows. The estimated time for this step is 0.04 seconds.

Page 44: Advanced Teradata Concepts

Secondary Index On Top Of Join Index Secondary Index On Top Of Join Index Further performance improvement can be achieved by defining a Secondary Index on top of join index to avoid full table scan of the join index table.

SELECT C_Name, C_Address, O_Orderdate, O_TotalPrice FROM Customer JOIN Ordertbl ON C_Custkey = O_Custkey WHERE O_Orderdate BETWEEN 950101 AND 970101;

CREATE JOIN INDEX OrderByCust ASSELECT (C_Name ,C_Address ), (O_Orderdate ,O_TotalPrice) FROM Customer INNER JOIN Ordertbl ON C_CustKey = O_Custkey

CREATE INDEX(O_Orderdate) ORDER BY VALUES ON OrderByCustomer;

COLLECT STATISTICS ON OrderByCust INDEX(O_Orderdate);

Page 45: Advanced Teradata Concepts

Sparse Indexes Sparse Indexes

Sparse Index can be used to index a portion of a table.

CREATE JOIN INDEX OrderByCust ASSELECT (C_Custkey, C_Name ,C_Address, O_Orderdate), (O_TotalPrice) FROM Customer INNER JOIN Ordertbl ON C_CustKey = O_CustkeyWHERE O_Orderdate > DATE ‘2004-01-01' PRIMARY INDEX(C_Custkey);

Shorter index table enables faster full table scan.

SELECT C_Name ,C_Address, O_Orderdate ,O_TotalPrice FROM Customer INNER JOIN Ordertbl ON C_CustKey = O_CustkeyWHERE O_Orderdate BETWEEN DATE '2004-06-01' AND DATE '2004-12-31';

Page 46: Advanced Teradata Concepts

Join Index - Compressed FormJoin Index - Compressed Form

CREATE JOIN INDEX OrderByCust ASSELECT C_Custkey, C_Name ,C_Address, O_Orderdate, O_TotalPrice FROM Customer INNER JOIN Ordertbl ON C_CustKey = O_CustkeyPRIMARY INDEX(C_Custkey);

SELECT SUM(CurrentPerm) FROM DBC.TableSize WHERE DataBaseName = 'tpch' ANDTableName = 'OrderByCust';

Sum(CurrentPerm)------------------------- 4,791,296

Uncompressed Form

Page 47: Advanced Teradata Concepts

Join Index - Compressed FormJoin Index - Compressed Form

CREATE JOIN INDEX OrderByCust ASSELECT (C_Custkey, C_Name ,C_Address), (O_Orderdate, O_TotalPrice) FROM Customer INNER JOIN Ordertbl ON C_CustKey = O_CustkeyPRIMARY INDEX(C_Custkey);

SELECT SUM(CurrentPerm) FROM DBC.TableSize WHERE DataBaseName = 'tpch' ANDTableName = 'OrderByCust';

Sum(CurrentPerm)------------------------- 1,209,856

Compressed Form

Page 48: Advanced Teradata Concepts

Single-Table Join IndexesSingle-Table Join Indexes

Single-table join indexes help in performance improvement in certain kind of joins by partially covering the query.

SELECT d.DeptName, e.Name FROM Employee e,Department d WHERE e.DeptNo = d.DeptNo;

4) We do an all-AMPs RETRIEVE step from PERSONNEL.e by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 21 rows. The estimated time for this step is 0.03 seconds.5) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash match scan with no residual conditions, which is joined to Spool 2 (Last Use). PERSONNEL.d and Spool 2 are joined using a merge join, with a join condition of ("DeptNo = PERSONNEL.d.DeptNo"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 18 rows. The estimated time for this step is 0.04 seconds.

Cont..

Page 49: Advanced Teradata Concepts

Single-Table Join IndexesSingle-Table Join Indexes

CREATE JOIN INDEX EmpDept AS SELECT Empno, Deptno, Name FROME Employee PRIMARY INDEX(DeptNo);

4) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash match scan with no residual conditions, which is joined to PERSONNEL.EmpDept. PERSONNEL.d and PERSONNEL.EmpDept are joined using a merge join, with a join condition of ( "PERSONNEL.EmpDept.DeptNo = PERSONNEL.d.DeptNo"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 18 rows. The estimated time for this step is 0.04 seconds.

As the join index covers the employee part of the query, Optimizer joins Department table with the join index instead of Employee table itself.

Page 50: Advanced Teradata Concepts

Single-Table Join IndexesSingle-Table Join Indexes

SELECT d.DeptName, e.Name, e.Salary FROM Employee e,Department d WHERE e.DeptNo = d.DeptNo;

4) We do an all-AMPs RETRIEVE step from PERSONNEL.e by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 21 rows. The estimated time for this step is 0.03 seconds.5) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash match scan with no residual conditions, which is joined to Spool 2 (Last Use). PERSONNEL.d and Spool 2 are joined using a merge join, with a join condition of ("DeptNo = PERSONNEL.d.DeptNo"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 18 rows. The estimated time for this step is 0.04 seconds.

Note : Optimizer went for full table scan of the Employee table instead of using Join Index because the existing join index EmpDept does not fully cover the Employee part of the query.

Page 51: Advanced Teradata Concepts

Single-Table Join Indexes Single-Table Join Indexes

ROWID can be included in the join index definition to enable rowid join for partially covered queries.

CREATE JOIN INDEX JIorders AS SELECT (O_CUSTKEY ), (O_ORDERDATE,O_TOTALPRICE, ROWID) FROM Ordertbl PRIMARY INDEX (O_CUSTKEY);

SELECT C_Name, C_Address, O_Orderdate, O_TotalPrice, O_OrderstatusFROM Customer, Ordertbl WHERE C_Custkey = O_Custkey AND C_Nationkey = 10;

Cont…

Page 52: Advanced Teradata Concepts

Single-Table Join IndexesSingle-Table Join Indexes

5) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash match scan with a condition of ("TPCH.Customer.C_NATIONKEY = 10"), which is joined to TPCH.JIorders. TPCH.Customer and TPCH.JIorders are joined using a merge join, with a join condition of ( "TPCH.Customer.C_CUSTKEY = TPCH.JIorders.O_CUSTKEY"). The input table TPCH.JIorders will not be cached in memory. The result goes into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with no confidence to be 9,000 rows. The estimated time for this step is 0.44 seconds. 6) We do an all-AMPs JOIN step from TPCH.Ordertbl by way of a RowHash match scan with no residual conditions, which is joined to Spool 2 (Last Use). TPCH.Ordertbl and Spool 2 are joined using a merge join, with a join condition of ("Field_2 = TPCH.Ordertbl.RowID"). The input table TPCH.Ordertbl will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 9,000 rows. The estimated time for this step is 0.56 seconds.

Page 53: Advanced Teradata Concepts

Single-Table Join IndexesSingle-Table Join Indexes

CREATE JOIN INDEX JIorders AS SELECT (O_CUSTKEY ), (O_ORDERDATE,O_TOTALPRICE, O_ORDERKEY) FROM Ordertbl UNIQUE PRIMARY INDEX (O_CUSTKEY);

SELECT C_Name, C_Address, O_Orderdate, O_TotalPrice, O_OrderstatusFROM Customer, Ordertbl WHERE C_Custkey = O_Custkey AND C_Nationkey = 10;

Unique Primary Index column can also be used in place of ROWID as shown in the example below.

Page 54: Advanced Teradata Concepts

Aggregate Join IndexAggregate Join Index

Aggregate Join Index are used to store pre-calculated summary data.

SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty FROM Lineitem GROUP BY 1,2;

3) We do an all-AMPs SUM step to aggregate from TPCH.lineitem by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 3. The input table will not be cached in memory, but it is eligible for synchronized scanning. The aggregate spool file will not be cached in memory. The size of Spool 3 is estimated with low confidence to be 238,809 rows. The estimated time for this step is 10.04 seconds.4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with low confidence to be 238,809 rows. The estimated time for this step is 1.56 seconds.

Page 55: Advanced Teradata Concepts

Aggregate Join IndexAggregate Join Index

CREATE JOIN INDEX AS SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty FROM Lineitem GROUP BY 1,2;

3) We do an all-AMPs RETRIEVE step from TPCH.JIAggLineItem by way of an all-rows scan with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 238,809 rows. The estimated time for this step is 0.19 seconds.

SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty FROM Lineitem GROUP BY 1,2;

Page 56: Advanced Teradata Concepts

Aggregate Join IndexAggregate Join Index

CREATE JOIN INDEX AS SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty FROM Lineitem GROUP BY 1,2;

SELECT L_ShipDate, SUM(L_Quantity) AS SumQty FROM Lineitem GROUP BY 1;

3) We do an all-AMPs SUM step to aggregate from TPCH.JIAggLineItem by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 3. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 3 is estimated with no confidence to be 491 rows. The estimated time for this step is 0.84 seconds. 4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 491 rows. The estimated time for this step is 0.04 seconds.

Does the index cover the query ?

Page 57: Advanced Teradata Concepts

Hash Indexes

Page 58: Advanced Teradata Concepts

Hash IndexesHash Indexes

Index file structures that share properties with single table join indexes and secondary indexes.

Hash indexes are like single table join indexes but they automatically carry base table primary index value.

SELECT O_CustKey, O_Orderdate, O_Totalprice, O_OrderstatusFROM OrderTbl WHERE O_CustKey > 12;

CREATE HASH INDEX HIOrder(O_CustKey , O_OrderDate, O_TotalPrice)ON OrderTblBY (O_CustKey)ORDER BY (O_CustKey)

SELECT O_CustKey, O_Orderdate, O_Totalprice, FROM OrderTbl WHERE O_CustKey > 12;

Page 59: Advanced Teradata Concepts

Hash IndexesHash Indexes

SELECT C_Name, C_Address, O_Orderdate, O_TotalPrice, O_OrderstatusFROM Customer, Ordertbl WHERE C_Custkey = O_Custkey AND O_Custkey < 10;

CREATE HASH INDEX HIOrder(O_CustKey , O_OrderDate, O_TotalPrice)ON OrderTblBY (O_CustKey)ORDER BY (O_CustKey)

Explain

Page 60: Advanced Teradata Concepts

Hash IndexesHash Indexes

5) We do an all-AMPs JOIN step from TPCH.Customer by way of an all-rows scan with a condition of ("TPCH.Customer.C_CUSTKEY < 10"), which is joined to TPCH.HIOrder with a range constraint of ( "TPCH.HIOrder.O_CUSTKEY <= 9") with an additional condition of ( "TPCH.HIOrder.O_CUSTKEY <= 9"). TPCH.Customer and TPCH.HIOrder are joined using a product join, with a join condition of ( "TPCH.Customer.C_CUSTKEY = TPCH.HIOrder.O_CUSTKEY"). The input table TPCH.HIOrder will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with no confidence to be 3 rows. The estimated time for this step is 0.10 seconds.6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to TPCH.Ordertbl. Spool 2 and TPCH.Ordertbl are joined using a merge join, with a join condition of ("(Field_3 = (SUBSTRING((TPCH.Ordertbl.RowID) FROM 7 FOR 4 ))) AND (Field_2 =)"). The input table TPCH.Ordertbl will not be cached in memory. The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 3 rows. The estimated time for this step is 0.05 seconds.

Page 61: Advanced Teradata Concepts

Hash IndexesHash Indexes

Hash Index ( also Join Index ) can also be used to avoid row redistribution for join preparation.

SELECT C_Name, C_Address, O_Orderdate, O_TotalPrice FROM Customer, Ordertbl WHERE C_Custkey = O_Custkey;

Page 62: Advanced Teradata Concepts

Hash IndexesHash Indexes

Without Hash Index defined:

4) We do an all-AMPs RETRIEVE step from TPCH.Ordertbl by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The result spool file will not be cached in memory. The size of Spool 2 is estimated with high confidence to be 60,000 rows. The estimated time for this step is 1.20 seconds. 5) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash match scan with no residual conditions, which is joined to Spool 2 (Last Use). TPCH.Customer and Spool 2 are joined using a merge join, with a join condition of ("TPCH.Customer.C_CUSTKEY = O_CUSTKEY"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 60,000 rows. The estimated time for this step is 0.48 seconds.…… The total estimated time is 1.68 seconds.

Page 63: Advanced Teradata Concepts

Hash IndexesHash Indexes

With Hash Index defined:

CREATE HASH INDEX HIOrder(O_Custkey, O_TotalPrice, O_OrderDate) ON OrderTbl BY (O_CustKey) ORDER BY HASH (O_CustKey);

4) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash match scan with no residual conditions, which is joined to TPCH.HIOrder. TPCH.Customer and TPCH.HIOrder are joined using a merge join, with a join condition of ("TPCH.Customer.C_CUSTKEY = TPCH.HIOrder.O_CUSTKEY"). The input table TPCH.HIOrder will not be cached in memory. The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 60,000 rows. The estimated time for this step is 0.52 seconds.……

The total estimated time is 0.52 seconds.

No redistribution, No sorting.

Total join time significantly

reduced

Page 64: Advanced Teradata Concepts

Partitioned Primary Index

Page 65: Advanced Teradata Concepts

Partitioned Primary IndexesPartitioned Primary Indexes

Partitioned Primary Index (PPI) allows a class of queries to access a portion of a large table instead of the whole table.

PPI table rows are assigned to user defined partitions in each AMP enabling enhanced performance for range queries that are predicated on primary index values.

PPIs increase query efficiency by avoiding full table scan without the overhead and maintenance cost of secondary indexes.

Page 66: Advanced Teradata Concepts

NON-PPI TableNON-PPI Table

120 30 01/10

101 10 01/02

115 30 01/10

131 20 01/18

102 30 01/20

129 10 01/12

135 30 01/02

107 20 01/10

119 30 01/18

114 40 01/20

125 30 01/12

110 10 01/10

132 40 01/02

122 20 01/20

128 30 01/20

138 30 01/18

118 10 01/18

123 40 01/10

106 40 01/12

113 40 01/02

121 40 01/12

140 10 01/10

112 20 01/10

116 30 01/18

133 10 01/20

103 40 01/18

127 30 01/18

126 20 01/02

109 30 01/02

136 20 01/10

130 20 01/10

101 10 01/02

134 30 01/20

139 40 01/18

104 20 01/12

105 10 01/02

117 30 01/18

124 20 01/20

108 10 01/12

137 20 01/02

Records are sorted in row hash (not shown) sequence within the AMP.

SELECT * FROM Employee WHERE EmpId = 114;

SELECT * FROM Employee WHERE EmpId BETWEEN DATE ‘2004-01-12’ AND DATE ‘2004-01-18’;

Page 67: Advanced Teradata Concepts

PPI TablePPI Table

120 30 01/10

101 10 01/02

115 30 01/10

131 20 01/18

102 30 01/20

129 10 01/12

135 30 01/02

107 20 01/10

119 30 01/18

114 40 01/20

125 30 01/12

110 10 01/10

132 40 01/02

122 20 01/20

128 30 01/20

138 30 01/18

118 10 01/18

123 40 01/10

106 40 01/12

113 40 01/02

121 40 01/12

140 10 01/10

112 20 01/10

116 30 01/18

133 10 01/20

103 40 01/18

127 30 01/18

126 20 01/02

109 30 01/02

136 20 01/10 130 20 01/10

101 10 01/02

134 30 01/20

139 40 01/18

104 20 01/12

105 10 01/02

117 30 01/18

124 20 01/20

108 10 01/12

137 20 01/02

Records are sorted in row hash (not shown) sequence in each partition within the AMP.

SELECT * FROM Employee WHERE EmpId = 114;

SELECT * FROM Employee WHERE EmpId BETWEEN DATE ‘2004-01-12’ AND DATE ‘2004-01-18’;

Page 68: Advanced Teradata Concepts

PPI ExamplePPI Example

CREATE TABLE Lineitem ( L_ORDERKEY INTEGER, L_PARTKEY INTEGER, L_SUPPKEY INTEGER, L_LINENUMBER INTEGER , L_QUANTITY DECIMAL(15,2), L_EXTENDEDPRICE DECIMAL(15,2), L_DISCOUNT DECIMAL(15,2), L_TAX DECIMAL(15,2), L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE DATE, L_COMMITDATE DATE, L_RECEIPTDATE DATE, L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44))PRIMARY INDEX (L_ORDERKEY);

CREATE TABLE LineitemPPI ( L_ORDERKEY INTEGER, L_PARTKEY INTEGER, L_SUPPKEY INTEGER, L_LINENUMBER INTEGER , L_QUANTITY DECIMAL(15,2), L_EXTENDEDPRICE DECIMAL(15,2), L_DISCOUNT DECIMAL(15,2), L_TAX DECIMAL(15,2), L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE DATE, L_COMMITDATE DATE, L_RECEIPTDATE DATE, L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44))PRIMARY INDEX (L_ORDERKEY)PARTITION BY RANGE_N(L_ShipDate BETWEEN DATE '1992-01-03' AND DATE '1998-11-30' EACH INTERVAL '1' MONTH );

Page 69: Advanced Teradata Concepts

PPI ExamplePPI Example

NON-PPI Table:

EXPLAIN SELECT * FROM Lineitem WHERE l_Shipdate > DATE '1997-12-31';

3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an all-rows scan with a condition of ("TPCH.lineitem.L_SHIPDATE > DATE '1997-12-30'") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 27,783 rows. The estimated time for this step is 3.00 seconds.

SELECT MIN(L_Shipdate), MAX(L_Shipdate) FROM Lineitem;

Minimum(L_SHIPDATE) Maximum(L_SHIPDATE)------------------------------- --------------------------------1992-01-03 1998-11-30

Page 70: Advanced Teradata Concepts

PPI ExamplePPI Example

PPI Table:

EXPLAIN SELECT * FROM LineitemPPI WHERE l_Shipdate > DATE '1997-12-31';

3) We do an all-AMPs RETRIEVE step from 12 partitions of TPCH.lineitemppi with a condition of ( "TPCH.lineitemppi.L_SHIPDATE > DATE '1997-12-30'") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 27,735 rows. The estimated time for this step is 0.88 seconds.

Only 12 partitions are retrieved instead of a full table scan

Page 71: Advanced Teradata Concepts

PPI ExamplePPI Example

NON-PPI Table:

EXPLAIN SELECT * FROM Lineitem WHERE L_Orderkey = 240000;

1) First, we do a single-AMP RETRIEVE step from TPCH.lineitem by way of the primary index "TPCH.lineitem.L_ORDERKEY = 240000" with no residual conditions into Spool 1 (one-amp), which is built locally on that AMP. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with high confidence to be 5 rows. The estimated time for this step is 0.03 seconds.

Rows are stored in rowhash order within a AMP. Search is very efficient. Only one block read.

Page 72: Advanced Teradata Concepts

PPI ExamplePPI Example

PPI Table:

EXPLAIN SELECT * FROM LineitemPPI WHERE L_Orderkey = 240000;

1) First, we do a single-AMP RETRIEVE step from all partitions of TPCH.lineitemppi by way of the primary index "TPCH.lineitemppi.L_ORDERKEY = 240000" with a residual condition of ("TPCH.lineitemppi.L_ORDERKEY = 240000") into Spool 1 (one-amp), which is built locally on that AMP. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with high confidence to be 5 rows. The estimated time for this step is 0.67 seconds.

All partitions are to be scanned for comparison.

Page 73: Advanced Teradata Concepts

PPI – Delete PerformancePPI – Delete Performance

CREATE TABLE Lineitem ( L_ORDERKEY INTEGER, L_PARTKEY INTEGER, L_SUPPKEY INTEGER, L_LINENUMBER INTEGER , L_QUANTITY DECIMAL(15,2), L_EXTENDEDPRICE DECIMAL(15,2), L_DISCOUNT DECIMAL(15,2), L_TAX DECIMAL(15,2), L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE DATE, L_COMMITDATE DATE, L_RECEIPTDATE DATE, L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44))PRIMARY INDEX (L_ORDERKEY);

CREATE TABLE LineitemPPI ( L_ORDERKEY INTEGER, L_PARTKEY INTEGER, L_SUPPKEY INTEGER, L_LINENUMBER INTEGER , L_QUANTITY DECIMAL(15,2), L_EXTENDEDPRICE DECIMAL(15,2), L_DISCOUNT DECIMAL(15,2), L_TAX DECIMAL(15,2), L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE DATE, L_COMMITDATE DATE, L_RECEIPTDATE DATE, L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44))PRIMARY INDEX (L_ORDERKEY)PARTITION BY RANGE_N(L_ShipDate BETWEEN DATE '1992-01-03' AND DATE '1998-11-30' EACH INTERVAL '1' MONTH );

Page 74: Advanced Teradata Concepts

PPI – Delete PerformancePPI – Delete Performance

DELETE FROM Lineitem WHERE l_Shipdate BETWEEN DATE '1996-12-31' AND DATE '1997-12-31';

… 3) We do an all-AMPs DELETE from TPCH.Lineitem by way of an all-rows scan with a condition of ("(TPCH.Lineitem.L_SHIPDATE >= DATE '1996-12-31') AND (TPCH.Lineitem.L_SHIPDATE <= DATE '1997-12-31')").…

… 3) We do an all-AMPs DELETE from 2 partitions of TPCH.LineitemPPI with a condition of ("(TPCH.LineitemPPI.L_SHIPDATE >= DATE '1996-12-31') AND (TPCH.LineitemPPI.L_SHIPDATE <= DATE '1997-12-31')"). 4) We do an all-AMPs DELETE of 11 partitions of TPCH.LineitemPPI with a condition of ("(TPCH.LineitemPPI.L_SHIPDATE >= DATE '1996-12-31') AND (TPCH.LineitemPPI.L_SHIPDATE <= DATE '1997-12-31')").

DELETE FROM LineitemPPIWHERE l_Shipdate BETWEEN DATE '1996-12-31' AND DATE '1997-12-31';

Page 75: Advanced Teradata Concepts

PPI JoinsPPI Joins

CREATE TABLE LineitemPPI ( L_ORDERKEY INTEGER, L_PARTKEY INTEGER, L_SUPPKEY INTEGER, L_LINENUMBER INTEGER , L_QUANTITY DECIMAL(15,2), L_EXTENDEDPRICE DECIMAL(15,2), L_DISCOUNT DECIMAL(15,2), L_TAX DECIMAL(15,2), L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_COMMENT VARCHAR(44))PRIMARY INDEX (L_ORDERKEY)PARTITION BY RANGE_N(L_ShipDate BETWEEN DATE '1992-01-03' AND DATE '1998-11-30' EACH INTERVAL '1' MONTH );

CREATE TABLE Shipping ( S_ORDERKEY INTEGER, S_SHIPDATE DATE, S_RECEIPTDATE DATE, S_SHIPINSTRUCT CHAR(25), S_SHIPMODE CHAR(10))PRIMARY INDEX (S_ORDERKEY)PARTITION BY RANGE_N(S_ShipDate BETWEEN DATE '1992-01-03' AND DATE '1998-11-30' EACH INTERVAL '1' MONTH );

SELECT L_Orderkey, L_Shipdate,S_Shipmode FROM LineitemPPI INNER JOIN Shipping ON L_Orderkey = S_Orderkey;

Page 76: Advanced Teradata Concepts

PPI JoinsPPI Joins

4) We do an all-AMPs JOIN step from all partitions of TPCH.shipping by way of a RowHash match scan with a condition of ("NOT (TPCH.shipping.S_SHIPDATE IS NULL)"), which is joined to TPCH.lineitemppi with a condition of ("NOT (TPCH.lineitemppi.L_SHIPDATE IS NULL)"). TPCH.shipping and TPCH.lineitemppi are joined using a rowkey-based merge join, with a join condition of ("(TPCH.lineitemppi.L_SHIPDATE = TPCH.shipping.S_SHIPDATE) AND (TPCH.lineitemppi.L_ORDERKEY = TPCH.shipping.S_ORDERKEY)"). The input tables TPCH.shipping and TPCH.lineitemppi will not be cached in memory, but TPCH.shipping is eligible for synchronized scanning. The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with low confidence to be 401,785 rows. The estimated time for this step is 4.85 seconds.

Page 77: Advanced Teradata Concepts

Collect Statistics

Page 78: Advanced Teradata Concepts

Collect StatisticsCollect Statistics

Optimizer must be provided with correct demographic information on your data to choose optimal plan to execute your query.

Statistics tells the optimizer How many rows per value are there.How many distinct values are there in the

column.

If Collected Statistics are not available, optimizer does random AMP sampling to derive demographics.

Page 79: Advanced Teradata Concepts

Collect StatisticsCollect Statistics

Collected statistics are not automatically updated by Teradata DBS.

User must refresh statistics when 5% to 10% change on the table rows.

Collect Statistics on

All non-unique Indexes of a table or a join index.

Any column used in WHERE clause for set selection or join constraint.

Page 80: Advanced Teradata Concepts

Collect StatisticsCollect Statistics

COLLECT STATISTICS ON Lineitem COLUMN L_Orderkey;

COLLECT STATISTICS ON Lineitem COLUMN (L_Orderkey, L_Shipdate);

COLLECT STATISTICS ON Lineitem COLUMN L_Shipdate;

HELP STATISTICS Lineitem;

Date Time Unique Values Column Names------------- ------------ -------------------- ------------------------------------04/10/05 11:04:48 60,000 L_ORDERKEY04/10/05 09:57:52 2,524 L_SHIPDATE04/10/05 11:49:47 236,352 L_ORDERKEY,L_SHIPDATE

Page 81: Advanced Teradata Concepts

Data Compression

Page 82: Advanced Teradata Concepts

Data Compression Data Compression

Makes row sizes smaller

Allows more rows per block

Reduces the number of I/Os

Implemented in column level

Compression is a I/O-intensive workload.

Improvement gained through the more-rows-per-block concept is significant in the Full Table Scan operations.

Compression is transparent to applications.

Page 83: Advanced Teradata Concepts

Data CompressionData Compression

CREATE TABLE Employee(EmployeeNo INTEGER…Jobtitle CHARACTER(30) COMPRESS (’cashier’)…);

Single-Value Compression V2R4 and prior

CREATE TABLE Employee(EmployeeNo INTEGER…Jobtitle CHARACTER(30) COMPRESS (’cashier’, ‘manager’, ‘programmer’)...);

Multi-Value CompressionV2R5 and later

Nulls and cashiers will be compressed.

Cashiers, managers, programmers will be compressed including nulls.

255 distinct values for an individualcolumn can be compressed.

Page 84: Advanced Teradata Concepts

Data Compression - ImplementationData Compression - Implementation

The following graphics shows how Data Compression is implemented in Teradata

Table Header

Field: StreetAddressVARCHAR(40)

Field: City CAHR(20)01 ‘Chicago ‘10 ‘Los Angeles ‘11 ‘New York ‘

Field: StateCode Char(2)

CREATE TABLE CompressExample (StreetAddress VARCHAR(40),City CHARACTER(20) COMPRESS (’New York’, ’Los Angeles’, ’Chicago’) NOT NULL,StateCode CHARACTER(2));

130 Sutter St.

00

01

11

01

11

00

10

San Francisco

CA

133 Wacker Drive.

IL

5 Times Sq. NY

900 North Michigan Av.

IL

135 East 57th NY

1525 Howe St. Racine WI

304 S. Broadway CA

Actual Data Rows

Page 85: Advanced Teradata Concepts

Multi Value Compression & VARCHAR Multi Value Compression & VARCHAR VARCHAR consumes two extra bytes for each value

whereas compression consumes CPU resource to decode compress values.

The data demographics determine whether variable length character data type or fixed length plus compression is more efficient.

VARCHAR is better when the difference of max and average field length is high and a low percentage of fields are compressible.

Compression is better when the difference of max and average field length is low and a high percentage of fields are compressible.

If no clear pictures about data demographics are available, use VARCHAR as it is less CPU intensive.

Page 86: Advanced Teradata Concepts

Query Management

Page 87: Advanced Teradata Concepts

Priority SchedulerPriority Scheduler

Configure the system to execute queries at a higher priority submitted by Sales Managers.

A DBA may want to:

Or

Configure the system to execute queries submitted by Development group at a lower priority during 8:00 AM and 3:00 PM and execute at medium priority during 3:00 PM and 8:00 AM.

Or

Lower the priority of a job if it takes more than one hour to complete.

Page 88: Advanced Teradata Concepts

Priority SchedulerPriority Scheduler

Can be used to control resources allocated to users.

Administrator can specify performance group while creating the user.

It manages resource distribution to improve performance of one application at the expense of other.

Page 89: Advanced Teradata Concepts

Priority Scheduler ComponentsPriority Scheduler Components

RP#

L M H R

8am–5pmAG1

5pm-9pmAG2

9pm-8amAG3

8pm-8amAG4

8am-8pmAG3

AG15

AG210

AG320

AG440

Resource Partition

PerformanceGroups

PerformancePeriods

AllocationGroups

Page 90: Advanced Teradata Concepts

Priority Scheduler ComponentsPriority Scheduler Components

Resource Partition High level Resource

Partitioning Default is Partition 0

Performance Period Controls the scheduling

policy at that point in time. Links a PG to an Allocation

Group’s weight and policy

Performance Group Provides relative priority with

in the Resource Partition Can be specified in the

Account String in Create User statement.

Can be specified in user Logon String ($M$, $DEV$ etc).

Allocation Group Defines a method for

disbursing resources among sessions active within that allocation group

Carries the weight. Defines a scheduling policy

Page 91: Advanced Teradata Concepts

Example 1 – Percentage of Resource AllocationExample 1 – Percentage of Resource Allocation

User WHDev with performance group $L$ logged on to the system at 9:30 PM.

What is the percentage share of system resources the user WHDev will get ?

Sum of the weights = 5 + 10 + 20 + 40 = 75

At 9:30 PM performance group L will be assigned to allocation group AG3.

So % of resource allocation = 20/75 = 26%

Page 92: Advanced Teradata Concepts

Example 2 - Automatic Change in Priority Based CPU usage Example 2 - Automatic Change in Priority Based CPU usage

Performance Period 1Usage 3600 SecondsAllocation Group AG11

Allocation Grp=AG11Weight=40

Performance Period 2Usage 0 SecondsAllocation Group AGDEF

Allocation Grp=AGDEFWeight=5

01000 2000 3000 Time

Pri

ori

ty

5

10

20

Page 93: Advanced Teradata Concepts

Teradata Dynamic Query ManagerTeradata Dynamic Query Manager

Prevent all queries that are estimated to return more than 100,000 rows from running between the hours of 8:00 a.m. and 1:00 p.m. on Fridays.

A DBA may want to:

Or

Prevent all queries from Testing group that are estimated to take more than 3 minutes running between the hours of 8:00 a.m. and 3.00 p.m. on Monday.

Or

Schedule a request to run on every Friday at 8.00 pm.

Page 94: Advanced Teradata Concepts

Teradata Dynamic Query ManagerTeradata Dynamic Query ManagerTeradata Dynamic Query Manager (TDQM) is product

that enables you to effectively manage the access to and utilization of a Teradata database system.

Managing the database system increases the workload capacity and efficiency of database usage.

TDQM addresses the key problems of database system overload and network saturation that result from large number of clients accessing the Teradata system.

Two main functionalities of TDQM are: Limiting the execution of some queries on the Teradata

database according to rules – Query Management. Scheduling SQL request for batch execution – Scheduled

Requests.

Page 95: Advanced Teradata Concepts

TDQM ArchitectureTDQM Architecture

Teradata RDBMS

TDQMPartition

User DataTDQM

Metadata

All Client systems

accessing Teradata.

TDQM Administrat

or

Scheduled Requests

Client

Scheduled Request Server

Query Management Scheduled Requests

Page 96: Advanced Teradata Concepts

Teradata Dual Active SolutionTeradata Dual Active Solution

Provides support for unplanned down time.

Eliminates the need for planned down time.

Provides additional processing power to smooth out peak workload on the primary system.

100% data replication as well as only mission-critical data replication possible.

Page 97: Advanced Teradata Concepts

Teradata Dual Active Solution ArchitectureTeradata Dual Active Solution Architecture

Primary System

Backup System

DataSynchronization

Teradata Query Director

Users/Applications

Operation ControlUsers/

Applications

Users/Applications

Page 98: Advanced Teradata Concepts

Teradata Query DirectorTeradata Query Director

Designed to intelligently route queries based on customer-established rules.

Helps to share workload between the system.

Provide failover capability.

Page 99: Advanced Teradata Concepts

Questions ?