Upload
pindiganti
View
217
Download
0
Embed Size (px)
Citation preview
8/10/2019 Teradata-Day2
1/33
TERADATA- DAY 2
Teradata Indexes
Types of tables
Prepared By
nilKumar P
8/10/2019 Teradata-Day2
2/33
-Primary index
-Unique Primary Index (UPI)
-Non Unique Primary Index(NUPI)
-No Primary Index (NOPI)
-Partition Primary Index(PPI)
-Secondary Index
-Unique Secondary Index (USI)
-Non Unique Secondary Index(NUSI)
-Join Index
-Single Table Join Index(STJI)-Multi table Join Index (MTJI)
-Aggregate Join Index (AJI)
-Hash Index
-Types of tables
-Set table
-Multi set table-Derived table
-Volatile table
-Global Temporary Table
-Locks
8/10/2019 Teradata-Day2
3/33
Types of tables:
Derived tables are always local to a single SQL request. They are built dynamically
using an additional SELECT within the query. The rows of the derived table are stored
in spool and discarded as soon as the query finishes..
Volatile Temporary tables are local to a session rather than a specific query. This
means that the table may be used repeatedly within a user session. That is the
major difference between volatile temporary tables (multiple use) and derivedtables (single use). Like a derived, a volatile temporary table is materialized in spool
space. However, it is not discarded until the session ends or when the user
manually drops it.
Global Temporary tables are local to a session, like volatile tables. Global temporary
tables are used temporary space. But the major difference is GTT Data Definition isstored in Data Dictionary. But not data. When ever user come out the session data
automatically deleted but not definition.
8/10/2019 Teradata-Day2
4/33
Ex 1 : Select * From ( Select AVG(SAL) as Avgsalary From Emp) sample;
SELECT Dept_No, First_Name, Last_Name, AVGSAL
FROM Employee_Table
INNER JOIN
(SELECT Dept_No, AVG(Salary) FROM Employee_Table
GROUP BY Dept_No) as Sample (Dno, AVGSAL)ON Dept_No = Dno
Ex 2 : A Derived Table that Joins to an Existing Table
Show all employees and their Average Salary per department!
The first THREE columns in the Answer Set came from the Employee_Table. AVGSAL came from
the derived table named TeraTom.
Derived Table Example :
8/10/2019 Teradata-Day2
5/33
8/10/2019 Teradata-Day2
6/33
Derived table name is tmp.
The table is required for this query but no others.The query will be run only one time with this data.
Derived column names are Prodid and Sumsales.
Table is created in spool using the inner SELECT.
SELECT statement is always in parenthesis following FROM.
8/10/2019 Teradata-Day2
7/33
Volatile Temporary tables are local to a session rather than a specific query. This
means that the table may be used repeatedly within a user session. That is the
major difference between volatile temporary tables (multiple use) and derived
tables (single use). Like a derived, a volatile temporary table is materialized inspool space. However, it is not discarded until the session ends or when the user
manually drops it.
Syntax: CREATE VOLATILE TABLE Dept_Agg_Vol , NO LOG
( Dept_no Integer
,Sum_Salary Decimal(10,2))
ON COMMIT PRESERVE ROWS ;
NO LOG allows for better performance.
LOG indicates that a transaction journal is maintained.
PRESERVE ROWS indicates keep table rows at TXN end.
DELETE ROWS indicates delete all table rows at TXN end.
8/10/2019 Teradata-Day2
8/33
The Three Steps to Use a Volatile Table :
CREATE VOLATILE TABLE Dept_Agg_Vol , NO
LOG
( Dept_no Integer,Sum_Salary Decimal(10,2)
)
ON COMMIT PRESERVE ROWS ;
INSERT INTO Dept_Agg_Vol
SELECT Dept_no,SUM(Salary)
FROM Employee_Table
GROUP BY Dept_no ;
SELECT * FROM Dept_Agg_Vol
ORDER BY 1;
1) A USER Creates a Volatile Table and then
2) populates the Volatile Table with an
INSERT/SELECT Statement, and then
3) Query it until you Logoff.
8/10/2019 Teradata-Day2
9/33
HELP VOLATILE TABLE ;
This command is used to display the names of all Volatile temporary tablesactive for the current user session.
SessionID TableName TableId Protection CreatorName CommitOption TransactionLog
1010 Dept_Agg_Vol 10C0C04 N Anil P N
8/10/2019 Teradata-Day2
10/33
CREATE Global Temporary TABLE Dept_Agg_GLO
( Dept_no Integer
,Sum_Salary Decimal(10,2))
ON COMMIT PRESERVE ROWS ;
Have LOG and ON COMMIT PRESERVE/DELETE options.
Global Temporary tables are local to a session, like volatile tables.
Global temporary tables are used temporary space. But the major
difference is GTT Data Definition is stored in Data Dictionary,But not
data. When ever user come out the session data automaticallydeleted but not definition.
8/10/2019 Teradata-Day2
11/33
The Three Steps to using a Global Temporary Table
CREATE Global Temporary TABLE
Dept_Agg_GLO
( Dept_no Integer,Sum_Salary Decimal(10,2)
)
ON COMMIT PRESERVE ROWS ;
INSERT INTO Dept_Agg_GLO
SELECT Dept_no,SUM(Salary)
FROM Employee_Table
GROUP BY Dept_no ;
SELECT * FROM Dept_Agg_GLO
ORDER BY 1;
8/10/2019 Teradata-Day2
12/33
Primary Index :
A Primary Index (PI) is the physical mechanism for assigning a data row to an AMPand a location on the AMPs disks. It is also used to access rows without having to
search the entire table.
The rows of every table are distributed among all AMPs Each AMP is responsible for a subset of the rows of each table. Ideally, each table will be evenly distributed among all AMPs. Evenly distributed tables result in evenly distributed workloads. The uniformity of distribution of the rows of a table depends on the choice of the
Primary Index.
Three Purpose of primary index
1-Distribution of rows to proper AMP.
2-Fastest way to Retrieve the single row
3-Accessig Joins
8/10/2019 Teradata-Day2
13/33
8/10/2019 Teradata-Day2
14/33
8/10/2019 Teradata-Day2
15/33
A Hashing Example
Order
OrderNumber
PK
UPI
CustomerNumber
OrderDate
OrderStatus
7325 2 4/13 O
7324 3 4/13 O
7415 3 4/13 O
7415 1 4/13 C7103 1 4/10 O
7225 2 4/15 C
7384 1 4/12 C
7402 3 4/12 C
7188 1 4/13 C
7202 2 4/09 C
SELECT * FROM orderWHERE order_number = 7202;
7202
Hashing Algorithm
691B 14AE
32 bit Row Hash
Remaining 16 bitsDestination Selection Word
0110 1001 0001 1011 0001 0100 1010 1110
6 9 1 B
8/10/2019 Teradata-Day2
16/33
The Hash Map
7202 Hashing Algorithm
(Hexadecimal)
691B 14AE
HASH MAP
07 06 07 06 07 04 05 06 05 05 14 09 14 13 03 04
15 08 02 04 01 00 14 14 03 02 03 09 01 00 02 15
01 00 15 11 14 14 13 13 14 14 08 09 15 10 09 09
07 06 15 13 11 06 15 08 15 15 08 08 11 07 05 10
04 12 11 13 05 10 07 07 03 02 11 04 01 00 11 13
11 11 12 10 03 02 06 13 01 00 06 05 07 06 05 12
0 1 2 3 4 5 6 7 8 9 A B C D E F
690
691
692
693
694
695
32 bit Row Hash
Remaining 16 bitsDestination Selection Word
0110 1001 0001 1011 0001 0100 1010 1110
6 9 1 B
AMP 9
7202 2 4/09 C
Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.
8/10/2019 Teradata-Day2
17/33
Identifying Rows
Consideration #1
A Row Hash = 32 bits = 4.2 billion possible
values
Because there is an infinite number of
possible data values, some data values willhave to share the same row hash.
Hash Algorithm
1254 7769
10A2 2936 10A2 2936 Hash Synonyms
Data values input
Consideration #2
A Primary Index may be non-unique (NUPI).
Different rows will have the same PI value
and thus the same row hash.
A row hash is not adequate to uniquely identify a row.
Conclusion
A row hash is no t adequate to uniqu ely ident i fy a row.
Hash Algorithm
(John)
'Smith'
0016 5557
(Dave)
'Smith' NUPI Duplicates
Rows have
same hash0016 5557
8/10/2019 Teradata-Day2
18/33
8/10/2019 Teradata-Day2
19/33
Secondary Index :
There are 3 general ways to access a table:
Primary Index access (one AMP access)
Secondary Index access (two or all AMP access)
Full Table Scan (all AMP access)
A secondary Index provides an alternate path to the rows of a table.
A table can have from 0 to 32 secondary indexes.
Secondary Indexes:
Do not effect table distribution.
Add overhead, both in terms of disk space and maintenance.
May be added or dropped dynamically as needed.
Are chosen to improve table performance
8/10/2019 Teradata-Day2
20/33
Choosing a Secondary Index
A Secondary Index may be defined ...
at table creation (CREATE TABLE)
following table creation (CREATE INDEX)
it supports up to 64 columns
If the index choice of column(s) is unique,it is called a USI.
Unique Secondary Index)
Accessing a row via a USI is a 2 AMP
operation.
USI
If the index choice of column(s) is non-unique, it is called a NUSI.
Non-Unique Secondary Index
Accessing row(s) via a NUSI is an all AMP
operation.
NUSI
CREATE UNIQUE INDEX
(Employee_Number) ON Employee;
CREATE INDEX
(Last_Name) ON Employee;
Notes:
Secondary Indexes cause an internal sub-table to be built.
Dropping the index causes the sub-table to be deleted.
8/10/2019 Teradata-Day2
21/33
Unique Secondary Index (USI) Access
CREATE UNIQUE INDEX(Cust) ON Customer;
SELECT *
FROM Customer
WHERE Cust = 56;
Create USI
Access via USI
Hashing
Algorithm
USI Value = 56
PE
Table ID
100
Row Hash
778
Unique Val
7
AMP 1 AMP 2 AMP 3 AMP 4
Base Table Base Table Base Table Base Table
RowIDCust Name Phone
USI NUPI
471, 1 45 Adams 444-6666
555, 6 98 Brown 333-9999
717, 2 72 Adams 666-7777884, 1 74 Smith 555-6666
RowIDCust Name Phone
USI NUPI
147, 1 49 Smith 111-6666
147, 2 12 Young 777-4444
388, 1 27 Jones 222-8888822, 1 62 Black 444-5555
RowIDCust Name Phone
USI NUPI
107, 1 37 White 555-4444
536, 5 84 Rice 666-5555
638, 1 31 Adams 111-2222640, 1 40 Smith 222-3333
RowIDCust Name Phone
USI NUPI
639, 1 77 Jones 777-6666
778, 3 95 Peters 555-7777
778, 7 56 Smith 555-7777
915, 9 51 Marsh 888-2222
USI Subtable USI Subtable USI Subtable USI Subtable
RowID Cust RowID
244, 1 74 884, 1
505, 1 77 639, 1
744, 4 51 915, 9
757, 1 27 388, 1
RowID Cust RowID
135, 1 98 555, 6
296, 1 84 536, 5
602, 1 56 778, 7
969, 1 49 147, 1
RowID Cust RowID
288, 1 31 638, 1
339, 1 40 640, 1
372, 2 45 471, 1
588, 1 95 778, 3
RowID Cust RowID
175, 1 37 107, 1
489, 1 72 717, 2
838, 4 12 147, 2
919, 1 62 822, 1
Message Passing Layer
AMP 1 AMP 2 AMP 3 AMP 4
Message Passing LayerCustomer
Table ID = 100
Table ID Row Hash USI Value
100 602 56
to MPL
8/10/2019 Teradata-Day2
22/33
Non-Unique Secondary Index (NUSI) Access
CREATE INDEX (Name) ONCustomer;
SELECT *
FROM Customer
WHERE Name = 'Adams';
Create NUSI
Access via NUSI
Hashing
Algorithm
NUSI Value = 'Adams'
PE
Message Passing Layer
AMP 1 AMP 2 AMP 3 AMP 4
Customer
Table ID = 100
Table ID Row Hash NUSI Value
100 567 Adams
to MPL
NUSI Subtable NUSI Subtable NUSI Subtable NUSI Subtable
RowID Name RowID
432, 8 Smith 640, 1
448, 1 White 107, 1
567, 3 Adams 638, 1
656, 1 Rice 536, 5
RowID Name RowID
432, 1 Smith 147, 1
448, 4 Black 822, 1
567, 6 Jones 338, 1
770, 1 Young 147, 2
RowID Name RowID
155, 1 Marsh 915, 9
396, 1 Peters 778, 3
432, 5 Smith 778, 7
567, 1 Jones 639, 1
RowID Name RowID
432, 3 Smith 884, 1
567, 2 Adams 471, 1
717, 2
852, 1 Brown 555, 6
AMP 1 AMP 2 AMP 3 AMP 4
Base Table Base Table Base Table Base Table
RowIDCust Name Phone
NUSI NUPI
471, 1 45 Adams 444-6666
555, 6 98 Brown 333-9999
717, 2 72 Adams 666-7777
884, 1 74 Smith 555-6666
RowIDCust Name Phone
NUSI NUPI
147, 1 49 Smith 111-6666
147, 2 12 Young 777-4444
388, 1 27 Jones 222-8888
822, 1 62 Black 444-5555
RowIDCust Name Phone
NUSI NUPI
107, 1 37 White 555-4444
536, 5 84 Rice 666-5555
638, 1 31 Adams 111-2222
640, 1 40 Smith 222-3333
RowIDCust Name Phone
NUSI NUPI
639, 1 77 Jones 777-6666
778, 3 95 Peters 555-7777
778, 7 56 Smith 555-7777
915, 9 51 Marsh 888-2222
8/10/2019 Teradata-Day2
23/33
Full Table Scans
Every row of the table must be read.
All AMPs scan their portion of the table in parallel.
Fast and efficient on Teradata due to parallelism.
Full table scans typically occur when either:
An index is not used in the query
An index is used in a non-equality test
Cust_ID Cust_Name Cust_Phone
USI NUPI
Customer
SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';
SELECT * FROM Customer WHERE Cust_Name = 'Davis';
SELECT * FROM Customer WHERE Cust_ID > 1000;
Examples of Full Table Scans:
8/10/2019 Teradata-Day2
24/33
Partitioned Primary Indexes (PPI)
What is a Partitioned Primary Index or PPI?
A new indexing mechanism in Teradata.
Data rows can be grouped into partitions at the AMP level.
What advantages does a PPI provide?
Increases the available options to improve the performance of certain types ofqueries.
Only the rows of the qualified partitions in a query need to be accessed - avoid fulltable scans.
Types of Partition Primary Index :
Range Based Partition and Case Based Partition.
As always, data is distributed among AMPs and automatically placed
within partitions.
In a table defined with a PPI, each row is uniquely identified by its Row Key.
Row Key = Partition # + Row Hash + Uniqueness Value
8/10/2019 Teradata-Day2
25/33
Logical Example of NPPI versus PPI
4 AMPs with
Orders Table defined
with PPI on O_Date.
RH O_# O_Date RH O_# O_Date RH O_# O_Date RH O_# O_Date
'35' 1007 02/09 '26' 1002 02/09 '24' 1004 02/09 '20' 1005 02/09
'39' 1011 02/09 '36' 1012 02/09 '32' 1003 02/09 '43' 1010 02/09
'14' 1001 02/09 '06' 1009 02/09 '04' 1008 02/09 '08' 1006 02/09
'03' 1016 02/10
'17' 1013 02/10
'48' 1023 02/10
'07' 1017 02/10
'16' 1021 02/10
'45' 1015 02/10
'09' 1018 02/10
'27' 1014 02/10
'44' 1022 02/10
'02' 1024 02/10
'11' 1019 02/10
'22' 1020 02/10
'01' 1028 02/11
'12' 1031 02/11
'28' 1032 02/11
'10' 1034 02/11
'29' 1033 02/11
'34' 1029 02/11
'19' 1025 02/11
'40' 1035 02/11
'47' 1027 02/11
'25' 1036 02/11
'31' 1026 02/11
'46' 1030 02/11
'23' 1040 02/12
'30' 1038 02/12
'42' 1047 02/12
'13' 1037 02/12
'21' 1045 02/12
'36' 1043 02/12
'05' 1048 02/12
'15' 1042 02/12
'33' 1039 02/12
'18' 1041 02/12
'38' 1046 02/12
'41' 1044 02/12
SELECT
WHERE O_Date
BETWEEN '2002-11-01'
AND '2002-
11-30';
4 AMPs with
Orders Table definedwith NPPI.
'01' 1028 02/11
'12' 1031 02/11
'28' 1032 02/11
'10' 1034 02/11
'29' 1033 02/11
'34' 1029 02/11
'19' 1025 02/11
'40' 1035 02/11
'47' 1027 02/11
'25' 1036 02/11
'31' 1026 02/11
'46' 1030 02/11
'03' 1016 02/10
'17' 1013 02/10
'48' 1023 02/10
'07' 1017 02/10
'16' 1021 02/10
'45' 1015 02/10
'09' 1018 02/10
'27' 1014 02/10
'44' 1022 02/10
'02' 1024 02/10
'11' 1019 02/10
'22' 1020 02/10
'14' 1001 02/09
'35' 1007 02/09
'39' 1011 02/09
'06' 1009 02/09
'26' 1002 02/09
'36' 1012 02/09
'04' 1008 02/09
'24' 1004 02/09
'32' 1003 02/09
'08' 1006 02/09
'20' 1005 02/09
'43' 1010 02/09
'23' 1040 02/12
'30' 1038 02/12
'42' 1047 02/12
'13' 1037 02/12
'21' 1045 02/12
'36' 1043 02/12
'05' 1048 02/12
'15' 1042 02/12
'33' 1039 02/12
'18' 1041 02/12
'38' 1046 02/12
'41' 1044 02/12
RH O_# O_Date RH O_# O_Date RH O_# O_Date RH O_# O_Date
8/10/2019 Teradata-Day2
26/33
Partitioning with RANGE_N
CREATE TABLE Sales
( store_id INTEGER NOT NULL,
item_id INTEGER NOT NULL,
sales_date DATE FORMAT 'YYYY-MM-DD',
total_revenue DECIMAL(9,2),
total_sold INTEGER,UNIQUE PRIMARY INDEX (store_id ,item_id ,sales_date)
PARTITION BY RANGE_N (
sales_date
BETWEEN DATE '2003-01-01' AND DATE '2003-12-31'
EACH INTERVAL '1' MONTH);
Notes:
Partition current sales table into daily partitions. Assume current sales table only has data for the first 3 months of 2003,
but we have defined partitions for the entire year 2003.
It is relatively easy to ALTER the table to extend the partitions for 2004.
A UPI is allowed because the partitioning columns are part of the PI.
8/10/2019 Teradata-Day2
27/33
Partitioning with CASE_N
Notes:
Partition the data based on total revenue for the products.
The NO CASE and UNKNOWN options allow for total_revenue >=100,000 or unknownrevenue.
A UPI isNOT allowed because the partitioning columns areNOT part of the PI.CREATE TABLE Sales_Revenue
( store_id INTEGER NOT NULL,
item_id INTEGER NOT NULL,
sales_date DATE FORMAT 'YYYY-MM-DD',
total_revenue DECIMAL(9,2),
total_sold INTEGER,)
PRIMARY INDEX (store_id, item_id, sales_date)
PARTITION BY CASE_N
( total_revenue < 2000 , total_revenue < 4000 ,
total_revenue < 6000 , total_revenue < 8000 ,
total_revenue < 10000 , total_revenue < 20000 ,
total_revenue < 50000 , total_revenue < 100000 ,
NO CASE,
UNKNOWN);
8/10/2019 Teradata-Day2
28/33
Join Index :
8/10/2019 Teradata-Day2
29/33
8/10/2019 Teradata-Day2
30/33
8/10/2019 Teradata-Day2
31/33
8/10/2019 Teradata-Day2
32/33
8/10/2019 Teradata-Day2
33/33