Teradata-Day2

8/10/2019 Teradata-Day2

1/33

TERADATA- DAY 2

Teradata Indexes

Types of tables

Prepared By

nilKumar P


2/33

-Primary index

-Unique Primary Index (UPI)

-Non Unique Primary Index(NUPI)

-No Primary Index (NOPI)

-Partition Primary Index(PPI)

-Secondary Index

-Unique Secondary Index (USI)

-Non Unique Secondary Index(NUSI)

-Join Index

-Single Table Join Index(STJI)-Multi table Join Index (MTJI)

-Aggregate Join Index (AJI)

-Hash Index

-Types of tables

-Set table

-Multi set table-Derived table

-Volatile table

-Global Temporary Table

-Locks


3/33

Types of tables:

Derived tables are always local to a single SQL request. They are built dynamically

using an additional SELECT within the query. The rows of the derived table are stored

in spool and discarded as soon as the query finishes..

Volatile Temporary tables are local to a session rather than a specific query. This

means that the table may be used repeatedly within a user session. That is the

major difference between volatile temporary tables (multiple use) and derivedtables (single use). Like a derived, a volatile temporary table is materialized in spool

space. However, it is not discarded until the session ends or when the user

manually drops it.

Global Temporary tables are local to a session, like volatile tables. Global temporary

tables are used temporary space. But the major difference is GTT Data Definition isstored in Data Dictionary. But not data. When ever user come out the session data

automatically deleted but not definition.


4/33

Ex 1 : Select * From ( Select AVG(SAL) as Avgsalary From Emp) sample;

SELECT Dept_No, First_Name, Last_Name, AVGSAL

FROM Employee_Table

INNER JOIN

(SELECT Dept_No, AVG(Salary) FROM Employee_Table

GROUP BY Dept_No) as Sample (Dno, AVGSAL)ON Dept_No = Dno

Ex 2 : A Derived Table that Joins to an Existing Table

Show all employees and their Average Salary per department!

The first THREE columns in the Answer Set came from the Employee_Table. AVGSAL came from

the derived table named TeraTom.

Derived Table Example :


5/33


6/33

Derived table name is tmp.

The table is required for this query but no others.The query will be run only one time with this data.

Derived column names are Prodid and Sumsales.

Table is created in spool using the inner SELECT.

SELECT statement is always in parenthesis following FROM.


7/33

Volatile Temporary tables are local to a session rather than a specific query. This

means that the table may be used repeatedly within a user session. That is the

major difference between volatile temporary tables (multiple use) and derived

tables (single use). Like a derived, a volatile temporary table is materialized inspool space. However, it is not discarded until the session ends or when the user

manually drops it.

Syntax: CREATE VOLATILE TABLE Dept_Agg_Vol , NO LOG

( Dept_no Integer

,Sum_Salary Decimal(10,2))

ON COMMIT PRESERVE ROWS ;

NO LOG allows for better performance.

LOG indicates that a transaction journal is maintained.

PRESERVE ROWS indicates keep table rows at TXN end.

DELETE ROWS indicates delete all table rows at TXN end.


8/33

The Three Steps to Use a Volatile Table :

CREATE VOLATILE TABLE Dept_Agg_Vol , NO

LOG

( Dept_no Integer,Sum_Salary Decimal(10,2)

)


INSERT INTO Dept_Agg_Vol

SELECT Dept_no,SUM(Salary)

FROM Employee_Table

GROUP BY Dept_no ;

SELECT * FROM Dept_Agg_Vol

ORDER BY 1;

1) A USER Creates a Volatile Table and then

2) populates the Volatile Table with an

INSERT/SELECT Statement, and then

3) Query it until you Logoff.


9/33

HELP VOLATILE TABLE ;

This command is used to display the names of all Volatile temporary tablesactive for the current user session.

SessionID TableName TableId Protection CreatorName CommitOption TransactionLog

1010 Dept_Agg_Vol 10C0C04 N Anil P N


10/33

CREATE Global Temporary TABLE Dept_Agg_GLO

( Dept_no Integer

,Sum_Salary Decimal(10,2))


Have LOG and ON COMMIT PRESERVE/DELETE options.

Global Temporary tables are local to a session, like volatile tables.

Global temporary tables are used temporary space. But the major

difference is GTT Data Definition is stored in Data Dictionary,But not

data. When ever user come out the session data automaticallydeleted but not definition.


11/33

The Three Steps to using a Global Temporary Table

CREATE Global Temporary TABLE

Dept_Agg_GLO

( Dept_no Integer,Sum_Salary Decimal(10,2)

)


INSERT INTO Dept_Agg_GLO

SELECT Dept_no,SUM(Salary)

FROM Employee_Table

GROUP BY Dept_no ;

SELECT * FROM Dept_Agg_GLO

ORDER BY 1;


12/33

Primary Index :

A Primary Index (PI) is the physical mechanism for assigning a data row to an AMPand a location on the AMPs disks. It is also used to access rows without having to

search the entire table.

The rows of every table are distributed among all AMPs Each AMP is responsible for a subset of the rows of each table. Ideally, each table will be evenly distributed among all AMPs. Evenly distributed tables result in evenly distributed workloads. The uniformity of distribution of the rows of a table depends on the choice of the

Primary Index.

Three Purpose of primary index

1-Distribution of rows to proper AMP.

2-Fastest way to Retrieve the single row

3-Accessig Joins


13/33


14/33


15/33

A Hashing Example

Order

OrderNumber

PK

UPI

CustomerNumber

OrderDate

OrderStatus

7325 2 4/13 O

7324 3 4/13 O

7415 3 4/13 O

7415 1 4/13 C7103 1 4/10 O

7225 2 4/15 C

7384 1 4/12 C

7402 3 4/12 C

7188 1 4/13 C

7202 2 4/09 C

SELECT * FROM orderWHERE order_number = 7202;

7202

Hashing Algorithm

691B 14AE

32 bit Row Hash

Remaining 16 bitsDestination Selection Word

0110 1001 0001 1011 0001 0100 1010 1110

6 9 1 B


16/33

The Hash Map

7202 Hashing Algorithm

(Hexadecimal)

691B 14AE

HASH MAP

07 06 07 06 07 04 05 06 05 05 14 09 14 13 03 04

15 08 02 04 01 00 14 14 03 02 03 09 01 00 02 15

01 00 15 11 14 14 13 13 14 14 08 09 15 10 09 09

07 06 15 13 11 06 15 08 15 15 08 08 11 07 05 10

04 12 11 13 05 10 07 07 03 02 11 04 01 00 11 13

11 11 12 10 03 02 06 13 01 00 06 05 07 06 05 12

0 1 2 3 4 5 6 7 8 9 A B C D E F

690

691

692

693

694

695

32 bit Row Hash

Remaining 16 bitsDestination Selection Word

0110 1001 0001 1011 0001 0100 1010 1110

6 9 1 B

AMP 9

7202 2 4/09 C

Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.


17/33

Identifying Rows

Consideration #1

A Row Hash = 32 bits = 4.2 billion possible

values

Because there is an infinite number of

possible data values, some data values willhave to share the same row hash.

Hash Algorithm

1254 7769

10A2 2936 10A2 2936 Hash Synonyms

Data values input

Consideration #2

A Primary Index may be non-unique (NUPI).

Different rows will have the same PI value

and thus the same row hash.

A row hash is not adequate to uniquely identify a row.

Conclusion

A row hash is no t adequate to uniqu ely ident i fy a row.

Hash Algorithm

(John)

'Smith'

0016 5557

(Dave)

'Smith' NUPI Duplicates

Rows have

same hash0016 5557


18/33


19/33

Secondary Index :

There are 3 general ways to access a table:

Primary Index access (one AMP access)

Secondary Index access (two or all AMP access)

Full Table Scan (all AMP access)

A secondary Index provides an alternate path to the rows of a table.

A table can have from 0 to 32 secondary indexes.

Secondary Indexes:

Do not effect table distribution.

Add overhead, both in terms of disk space and maintenance.

May be added or dropped dynamically as needed.

Are chosen to improve table performance


20/33

Choosing a Secondary Index

A Secondary Index may be defined ...

at table creation (CREATE TABLE)

following table creation (CREATE INDEX)

it supports up to 64 columns

If the index choice of column(s) is unique,it is called a USI.

Unique Secondary Index)

Accessing a row via a USI is a 2 AMP

operation.

USI

If the index choice of column(s) is non-unique, it is called a NUSI.

Non-Unique Secondary Index

Accessing row(s) via a NUSI is an all AMP

operation.

NUSI

CREATE UNIQUE INDEX

(Employee_Number) ON Employee;

CREATE INDEX

(Last_Name) ON Employee;

Notes:

Secondary Indexes cause an internal sub-table to be built.

Dropping the index causes the sub-table to be deleted.


21/33

Unique Secondary Index (USI) Access

CREATE UNIQUE INDEX(Cust) ON Customer;

SELECT *

FROM Customer

WHERE Cust = 56;

Create USI

Access via USI

Hashing

Algorithm

USI Value = 56

PE

Table ID

100

Row Hash

778

Unique Val

7

AMP 1 AMP 2 AMP 3 AMP 4

Base Table Base Table Base Table Base Table

RowIDCust Name Phone

USI NUPI

471, 1 45 Adams 444-6666

555, 6 98 Brown 333-9999

717, 2 72 Adams 666-7777884, 1 74 Smith 555-6666


USI NUPI

147, 1 49 Smith 111-6666

147, 2 12 Young 777-4444

388, 1 27 Jones 222-8888822, 1 62 Black 444-5555


USI NUPI

107, 1 37 White 555-4444

536, 5 84 Rice 666-5555

638, 1 31 Adams 111-2222640, 1 40 Smith 222-3333


USI NUPI

639, 1 77 Jones 777-6666

778, 3 95 Peters 555-7777

778, 7 56 Smith 555-7777

915, 9 51 Marsh 888-2222

USI Subtable USI Subtable USI Subtable USI Subtable

RowID Cust RowID

244, 1 74 884, 1

505, 1 77 639, 1

744, 4 51 915, 9

757, 1 27 388, 1

RowID Cust RowID

135, 1 98 555, 6

296, 1 84 536, 5

602, 1 56 778, 7

969, 1 49 147, 1

RowID Cust RowID

288, 1 31 638, 1

339, 1 40 640, 1

372, 2 45 471, 1

588, 1 95 778, 3

RowID Cust RowID

175, 1 37 107, 1

489, 1 72 717, 2

838, 4 12 147, 2

919, 1 62 822, 1

Message Passing Layer


Message Passing LayerCustomer

Table ID = 100

Table ID Row Hash USI Value

100 602 56

to MPL


22/33

Non-Unique Secondary Index (NUSI) Access

CREATE INDEX (Name) ONCustomer;

SELECT *

FROM Customer

WHERE Name = 'Adams';

Create NUSI

Access via NUSI

Hashing

Algorithm

NUSI Value = 'Adams'

PE

Message Passing Layer


Customer

Table ID = 100

Table ID Row Hash NUSI Value

100 567 Adams

to MPL

NUSI Subtable NUSI Subtable NUSI Subtable NUSI Subtable

RowID Name RowID

432, 8 Smith 640, 1

448, 1 White 107, 1

567, 3 Adams 638, 1

656, 1 Rice 536, 5

RowID Name RowID

432, 1 Smith 147, 1

448, 4 Black 822, 1

567, 6 Jones 338, 1

770, 1 Young 147, 2

RowID Name RowID

155, 1 Marsh 915, 9

396, 1 Peters 778, 3

432, 5 Smith 778, 7

567, 1 Jones 639, 1

RowID Name RowID

432, 3 Smith 884, 1

567, 2 Adams 471, 1

717, 2

852, 1 Brown 555, 6


Base Table Base Table Base Table Base Table


NUSI NUPI

471, 1 45 Adams 444-6666

555, 6 98 Brown 333-9999

717, 2 72 Adams 666-7777

884, 1 74 Smith 555-6666


NUSI NUPI

147, 1 49 Smith 111-6666

147, 2 12 Young 777-4444

388, 1 27 Jones 222-8888

822, 1 62 Black 444-5555


NUSI NUPI

107, 1 37 White 555-4444

536, 5 84 Rice 666-5555

638, 1 31 Adams 111-2222

640, 1 40 Smith 222-3333


NUSI NUPI

639, 1 77 Jones 777-6666

778, 3 95 Peters 555-7777

778, 7 56 Smith 555-7777

915, 9 51 Marsh 888-2222


23/33

Full Table Scans

Every row of the table must be read.

All AMPs scan their portion of the table in parallel.

Fast and efficient on Teradata due to parallelism.

Full table scans typically occur when either:

An index is not used in the query

An index is used in a non-equality test

Cust_ID Cust_Name Cust_Phone

USI NUPI

Customer

SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';

SELECT * FROM Customer WHERE Cust_Name = 'Davis';

SELECT * FROM Customer WHERE Cust_ID > 1000;

Examples of Full Table Scans:


24/33

Partitioned Primary Indexes (PPI)

What is a Partitioned Primary Index or PPI?

A new indexing mechanism in Teradata.

Data rows can be grouped into partitions at the AMP level.

What advantages does a PPI provide?

Increases the available options to improve the performance of certain types ofqueries.

Only the rows of the qualified partitions in a query need to be accessed - avoid fulltable scans.

Types of Partition Primary Index :

Range Based Partition and Case Based Partition.

As always, data is distributed among AMPs and automatically placed

within partitions.

In a table defined with a PPI, each row is uniquely identified by its Row Key.

Row Key = Partition # + Row Hash + Uniqueness Value


25/33

Logical Example of NPPI versus PPI

4 AMPs with

Orders Table defined

with PPI on O_Date.

RH O_# O_Date RH O_# O_Date RH O_# O_Date RH O_# O_Date

'35' 1007 02/09 '26' 1002 02/09 '24' 1004 02/09 '20' 1005 02/09

'39' 1011 02/09 '36' 1012 02/09 '32' 1003 02/09 '43' 1010 02/09

'14' 1001 02/09 '06' 1009 02/09 '04' 1008 02/09 '08' 1006 02/09

'03' 1016 02/10

'17' 1013 02/10

'48' 1023 02/10

'07' 1017 02/10

'16' 1021 02/10

'45' 1015 02/10

'09' 1018 02/10

'27' 1014 02/10

'44' 1022 02/10

'02' 1024 02/10

'11' 1019 02/10

'22' 1020 02/10

'01' 1028 02/11

'12' 1031 02/11

'28' 1032 02/11

'10' 1034 02/11

'29' 1033 02/11

'34' 1029 02/11

'19' 1025 02/11

'40' 1035 02/11

'47' 1027 02/11

'25' 1036 02/11

'31' 1026 02/11

'46' 1030 02/11

'23' 1040 02/12

'30' 1038 02/12

'42' 1047 02/12

'13' 1037 02/12

'21' 1045 02/12

'36' 1043 02/12

'05' 1048 02/12

'15' 1042 02/12

'33' 1039 02/12

'18' 1041 02/12

'38' 1046 02/12

'41' 1044 02/12

SELECT

WHERE O_Date

BETWEEN '2002-11-01'

AND '2002-

11-30';

4 AMPs with

Orders Table definedwith NPPI.

'01' 1028 02/11

'12' 1031 02/11

'28' 1032 02/11

'10' 1034 02/11

'29' 1033 02/11

'34' 1029 02/11

'19' 1025 02/11

'40' 1035 02/11

'47' 1027 02/11

'25' 1036 02/11

'31' 1026 02/11

'46' 1030 02/11

'03' 1016 02/10

'17' 1013 02/10

'48' 1023 02/10

'07' 1017 02/10

'16' 1021 02/10

'45' 1015 02/10

'09' 1018 02/10

'27' 1014 02/10

'44' 1022 02/10

'02' 1024 02/10

'11' 1019 02/10

'22' 1020 02/10

'14' 1001 02/09

'35' 1007 02/09

'39' 1011 02/09

'06' 1009 02/09

'26' 1002 02/09

'36' 1012 02/09

'04' 1008 02/09

'24' 1004 02/09

'32' 1003 02/09

'08' 1006 02/09

'20' 1005 02/09

'43' 1010 02/09

'23' 1040 02/12

'30' 1038 02/12

'42' 1047 02/12

'13' 1037 02/12

'21' 1045 02/12

'36' 1043 02/12

'05' 1048 02/12

'15' 1042 02/12

'33' 1039 02/12

'18' 1041 02/12

'38' 1046 02/12

'41' 1044 02/12

RH O_# O_Date RH O_# O_Date RH O_# O_Date RH O_# O_Date


26/33

Partitioning with RANGE_N

CREATE TABLE Sales

( store_id INTEGER NOT NULL,

item_id INTEGER NOT NULL,

sales_date DATE FORMAT 'YYYY-MM-DD',

total_revenue DECIMAL(9,2),

total_sold INTEGER,UNIQUE PRIMARY INDEX (store_id ,item_id ,sales_date)

PARTITION BY RANGE_N (

sales_date

BETWEEN DATE '2003-01-01' AND DATE '2003-12-31'

EACH INTERVAL '1' MONTH);

Notes:

Partition current sales table into daily partitions. Assume current sales table only has data for the first 3 months of 2003,

but we have defined partitions for the entire year 2003.

It is relatively easy to ALTER the table to extend the partitions for 2004.

A UPI is allowed because the partitioning columns are part of the PI.


27/33

Partitioning with CASE_N

Notes:

Partition the data based on total revenue for the products.

The NO CASE and UNKNOWN options allow for total_revenue >=100,000 or unknownrevenue.

A UPI isNOT allowed because the partitioning columns areNOT part of the PI.CREATE TABLE Sales_Revenue

( store_id INTEGER NOT NULL,

item_id INTEGER NOT NULL,

sales_date DATE FORMAT 'YYYY-MM-DD',

total_revenue DECIMAL(9,2),

total_sold INTEGER,)

PRIMARY INDEX (store_id, item_id, sales_date)

PARTITION BY CASE_N

( total_revenue < 2000 , total_revenue < 4000 ,

total_revenue < 6000 , total_revenue < 8000 ,



NO CASE,

UNKNOWN);


28/33

Join Index :


29/33


30/33


31/33


32/33


33/33

Documents

Teradata-Day2