Teradata-Day2

Embed Size (px)

Citation preview

  • 8/10/2019 Teradata-Day2

    1/33

    TERADATA- DAY 2

    Teradata Indexes

    Types of tables

    Prepared By

    nilKumar P

  • 8/10/2019 Teradata-Day2

    2/33

    -Primary index

    -Unique Primary Index (UPI)

    -Non Unique Primary Index(NUPI)

    -No Primary Index (NOPI)

    -Partition Primary Index(PPI)

    -Secondary Index

    -Unique Secondary Index (USI)

    -Non Unique Secondary Index(NUSI)

    -Join Index

    -Single Table Join Index(STJI)-Multi table Join Index (MTJI)

    -Aggregate Join Index (AJI)

    -Hash Index

    -Types of tables

    -Set table

    -Multi set table-Derived table

    -Volatile table

    -Global Temporary Table

    -Locks

  • 8/10/2019 Teradata-Day2

    3/33

    Types of tables:

    Derived tables are always local to a single SQL request. They are built dynamically

    using an additional SELECT within the query. The rows of the derived table are stored

    in spool and discarded as soon as the query finishes..

    Volatile Temporary tables are local to a session rather than a specific query. This

    means that the table may be used repeatedly within a user session. That is the

    major difference between volatile temporary tables (multiple use) and derivedtables (single use). Like a derived, a volatile temporary table is materialized in spool

    space. However, it is not discarded until the session ends or when the user

    manually drops it.

    Global Temporary tables are local to a session, like volatile tables. Global temporary

    tables are used temporary space. But the major difference is GTT Data Definition isstored in Data Dictionary. But not data. When ever user come out the session data

    automatically deleted but not definition.

  • 8/10/2019 Teradata-Day2

    4/33

    Ex 1 : Select * From ( Select AVG(SAL) as Avgsalary From Emp) sample;

    SELECT Dept_No, First_Name, Last_Name, AVGSAL

    FROM Employee_Table

    INNER JOIN

    (SELECT Dept_No, AVG(Salary) FROM Employee_Table

    GROUP BY Dept_No) as Sample (Dno, AVGSAL)ON Dept_No = Dno

    Ex 2 : A Derived Table that Joins to an Existing Table

    Show all employees and their Average Salary per department!

    The first THREE columns in the Answer Set came from the Employee_Table. AVGSAL came from

    the derived table named TeraTom.

    Derived Table Example :

  • 8/10/2019 Teradata-Day2

    5/33

  • 8/10/2019 Teradata-Day2

    6/33

    Derived table name is tmp.

    The table is required for this query but no others.The query will be run only one time with this data.

    Derived column names are Prodid and Sumsales.

    Table is created in spool using the inner SELECT.

    SELECT statement is always in parenthesis following FROM.

  • 8/10/2019 Teradata-Day2

    7/33

    Volatile Temporary tables are local to a session rather than a specific query. This

    means that the table may be used repeatedly within a user session. That is the

    major difference between volatile temporary tables (multiple use) and derived

    tables (single use). Like a derived, a volatile temporary table is materialized inspool space. However, it is not discarded until the session ends or when the user

    manually drops it.

    Syntax: CREATE VOLATILE TABLE Dept_Agg_Vol , NO LOG

    ( Dept_no Integer

    ,Sum_Salary Decimal(10,2))

    ON COMMIT PRESERVE ROWS ;

    NO LOG allows for better performance.

    LOG indicates that a transaction journal is maintained.

    PRESERVE ROWS indicates keep table rows at TXN end.

    DELETE ROWS indicates delete all table rows at TXN end.

  • 8/10/2019 Teradata-Day2

    8/33

    The Three Steps to Use a Volatile Table :

    CREATE VOLATILE TABLE Dept_Agg_Vol , NO

    LOG

    ( Dept_no Integer,Sum_Salary Decimal(10,2)

    )

    ON COMMIT PRESERVE ROWS ;

    INSERT INTO Dept_Agg_Vol

    SELECT Dept_no,SUM(Salary)

    FROM Employee_Table

    GROUP BY Dept_no ;

    SELECT * FROM Dept_Agg_Vol

    ORDER BY 1;

    1) A USER Creates a Volatile Table and then

    2) populates the Volatile Table with an

    INSERT/SELECT Statement, and then

    3) Query it until you Logoff.

  • 8/10/2019 Teradata-Day2

    9/33

    HELP VOLATILE TABLE ;

    This command is used to display the names of all Volatile temporary tablesactive for the current user session.

    SessionID TableName TableId Protection CreatorName CommitOption TransactionLog

    1010 Dept_Agg_Vol 10C0C04 N Anil P N

  • 8/10/2019 Teradata-Day2

    10/33

    CREATE Global Temporary TABLE Dept_Agg_GLO

    ( Dept_no Integer

    ,Sum_Salary Decimal(10,2))

    ON COMMIT PRESERVE ROWS ;

    Have LOG and ON COMMIT PRESERVE/DELETE options.

    Global Temporary tables are local to a session, like volatile tables.

    Global temporary tables are used temporary space. But the major

    difference is GTT Data Definition is stored in Data Dictionary,But not

    data. When ever user come out the session data automaticallydeleted but not definition.

  • 8/10/2019 Teradata-Day2

    11/33

    The Three Steps to using a Global Temporary Table

    CREATE Global Temporary TABLE

    Dept_Agg_GLO

    ( Dept_no Integer,Sum_Salary Decimal(10,2)

    )

    ON COMMIT PRESERVE ROWS ;

    INSERT INTO Dept_Agg_GLO

    SELECT Dept_no,SUM(Salary)

    FROM Employee_Table

    GROUP BY Dept_no ;

    SELECT * FROM Dept_Agg_GLO

    ORDER BY 1;

  • 8/10/2019 Teradata-Day2

    12/33

    Primary Index :

    A Primary Index (PI) is the physical mechanism for assigning a data row to an AMPand a location on the AMPs disks. It is also used to access rows without having to

    search the entire table.

    The rows of every table are distributed among all AMPs Each AMP is responsible for a subset of the rows of each table. Ideally, each table will be evenly distributed among all AMPs. Evenly distributed tables result in evenly distributed workloads. The uniformity of distribution of the rows of a table depends on the choice of the

    Primary Index.

    Three Purpose of primary index

    1-Distribution of rows to proper AMP.

    2-Fastest way to Retrieve the single row

    3-Accessig Joins

  • 8/10/2019 Teradata-Day2

    13/33

  • 8/10/2019 Teradata-Day2

    14/33

  • 8/10/2019 Teradata-Day2

    15/33

    A Hashing Example

    Order

    OrderNumber

    PK

    UPI

    CustomerNumber

    OrderDate

    OrderStatus

    7325 2 4/13 O

    7324 3 4/13 O

    7415 3 4/13 O

    7415 1 4/13 C7103 1 4/10 O

    7225 2 4/15 C

    7384 1 4/12 C

    7402 3 4/12 C

    7188 1 4/13 C

    7202 2 4/09 C

    SELECT * FROM orderWHERE order_number = 7202;

    7202

    Hashing Algorithm

    691B 14AE

    32 bit Row Hash

    Remaining 16 bitsDestination Selection Word

    0110 1001 0001 1011 0001 0100 1010 1110

    6 9 1 B

  • 8/10/2019 Teradata-Day2

    16/33

    The Hash Map

    7202 Hashing Algorithm

    (Hexadecimal)

    691B 14AE

    HASH MAP

    07 06 07 06 07 04 05 06 05 05 14 09 14 13 03 04

    15 08 02 04 01 00 14 14 03 02 03 09 01 00 02 15

    01 00 15 11 14 14 13 13 14 14 08 09 15 10 09 09

    07 06 15 13 11 06 15 08 15 15 08 08 11 07 05 10

    04 12 11 13 05 10 07 07 03 02 11 04 01 00 11 13

    11 11 12 10 03 02 06 13 01 00 06 05 07 06 05 12

    0 1 2 3 4 5 6 7 8 9 A B C D E F

    690

    691

    692

    693

    694

    695

    32 bit Row Hash

    Remaining 16 bitsDestination Selection Word

    0110 1001 0001 1011 0001 0100 1010 1110

    6 9 1 B

    AMP 9

    7202 2 4/09 C

    Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.

  • 8/10/2019 Teradata-Day2

    17/33

    Identifying Rows

    Consideration #1

    A Row Hash = 32 bits = 4.2 billion possible

    values

    Because there is an infinite number of

    possible data values, some data values willhave to share the same row hash.

    Hash Algorithm

    1254 7769

    10A2 2936 10A2 2936 Hash Synonyms

    Data values input

    Consideration #2

    A Primary Index may be non-unique (NUPI).

    Different rows will have the same PI value

    and thus the same row hash.

    A row hash is not adequate to uniquely identify a row.

    Conclusion

    A row hash is no t adequate to uniqu ely ident i fy a row.

    Hash Algorithm

    (John)

    'Smith'

    0016 5557

    (Dave)

    'Smith' NUPI Duplicates

    Rows have

    same hash0016 5557

  • 8/10/2019 Teradata-Day2

    18/33

  • 8/10/2019 Teradata-Day2

    19/33

    Secondary Index :

    There are 3 general ways to access a table:

    Primary Index access (one AMP access)

    Secondary Index access (two or all AMP access)

    Full Table Scan (all AMP access)

    A secondary Index provides an alternate path to the rows of a table.

    A table can have from 0 to 32 secondary indexes.

    Secondary Indexes:

    Do not effect table distribution.

    Add overhead, both in terms of disk space and maintenance.

    May be added or dropped dynamically as needed.

    Are chosen to improve table performance

  • 8/10/2019 Teradata-Day2

    20/33

    Choosing a Secondary Index

    A Secondary Index may be defined ...

    at table creation (CREATE TABLE)

    following table creation (CREATE INDEX)

    it supports up to 64 columns

    If the index choice of column(s) is unique,it is called a USI.

    Unique Secondary Index)

    Accessing a row via a USI is a 2 AMP

    operation.

    USI

    If the index choice of column(s) is non-unique, it is called a NUSI.

    Non-Unique Secondary Index

    Accessing row(s) via a NUSI is an all AMP

    operation.

    NUSI

    CREATE UNIQUE INDEX

    (Employee_Number) ON Employee;

    CREATE INDEX

    (Last_Name) ON Employee;

    Notes:

    Secondary Indexes cause an internal sub-table to be built.

    Dropping the index causes the sub-table to be deleted.

  • 8/10/2019 Teradata-Day2

    21/33

    Unique Secondary Index (USI) Access

    CREATE UNIQUE INDEX(Cust) ON Customer;

    SELECT *

    FROM Customer

    WHERE Cust = 56;

    Create USI

    Access via USI

    Hashing

    Algorithm

    USI Value = 56

    PE

    Table ID

    100

    Row Hash

    778

    Unique Val

    7

    AMP 1 AMP 2 AMP 3 AMP 4

    Base Table Base Table Base Table Base Table

    RowIDCust Name Phone

    USI NUPI

    471, 1 45 Adams 444-6666

    555, 6 98 Brown 333-9999

    717, 2 72 Adams 666-7777884, 1 74 Smith 555-6666

    RowIDCust Name Phone

    USI NUPI

    147, 1 49 Smith 111-6666

    147, 2 12 Young 777-4444

    388, 1 27 Jones 222-8888822, 1 62 Black 444-5555

    RowIDCust Name Phone

    USI NUPI

    107, 1 37 White 555-4444

    536, 5 84 Rice 666-5555

    638, 1 31 Adams 111-2222640, 1 40 Smith 222-3333

    RowIDCust Name Phone

    USI NUPI

    639, 1 77 Jones 777-6666

    778, 3 95 Peters 555-7777

    778, 7 56 Smith 555-7777

    915, 9 51 Marsh 888-2222

    USI Subtable USI Subtable USI Subtable USI Subtable

    RowID Cust RowID

    244, 1 74 884, 1

    505, 1 77 639, 1

    744, 4 51 915, 9

    757, 1 27 388, 1

    RowID Cust RowID

    135, 1 98 555, 6

    296, 1 84 536, 5

    602, 1 56 778, 7

    969, 1 49 147, 1

    RowID Cust RowID

    288, 1 31 638, 1

    339, 1 40 640, 1

    372, 2 45 471, 1

    588, 1 95 778, 3

    RowID Cust RowID

    175, 1 37 107, 1

    489, 1 72 717, 2

    838, 4 12 147, 2

    919, 1 62 822, 1

    Message Passing Layer

    AMP 1 AMP 2 AMP 3 AMP 4

    Message Passing LayerCustomer

    Table ID = 100

    Table ID Row Hash USI Value

    100 602 56

    to MPL

  • 8/10/2019 Teradata-Day2

    22/33

    Non-Unique Secondary Index (NUSI) Access

    CREATE INDEX (Name) ONCustomer;

    SELECT *

    FROM Customer

    WHERE Name = 'Adams';

    Create NUSI

    Access via NUSI

    Hashing

    Algorithm

    NUSI Value = 'Adams'

    PE

    Message Passing Layer

    AMP 1 AMP 2 AMP 3 AMP 4

    Customer

    Table ID = 100

    Table ID Row Hash NUSI Value

    100 567 Adams

    to MPL

    NUSI Subtable NUSI Subtable NUSI Subtable NUSI Subtable

    RowID Name RowID

    432, 8 Smith 640, 1

    448, 1 White 107, 1

    567, 3 Adams 638, 1

    656, 1 Rice 536, 5

    RowID Name RowID

    432, 1 Smith 147, 1

    448, 4 Black 822, 1

    567, 6 Jones 338, 1

    770, 1 Young 147, 2

    RowID Name RowID

    155, 1 Marsh 915, 9

    396, 1 Peters 778, 3

    432, 5 Smith 778, 7

    567, 1 Jones 639, 1

    RowID Name RowID

    432, 3 Smith 884, 1

    567, 2 Adams 471, 1

    717, 2

    852, 1 Brown 555, 6

    AMP 1 AMP 2 AMP 3 AMP 4

    Base Table Base Table Base Table Base Table

    RowIDCust Name Phone

    NUSI NUPI

    471, 1 45 Adams 444-6666

    555, 6 98 Brown 333-9999

    717, 2 72 Adams 666-7777

    884, 1 74 Smith 555-6666

    RowIDCust Name Phone

    NUSI NUPI

    147, 1 49 Smith 111-6666

    147, 2 12 Young 777-4444

    388, 1 27 Jones 222-8888

    822, 1 62 Black 444-5555

    RowIDCust Name Phone

    NUSI NUPI

    107, 1 37 White 555-4444

    536, 5 84 Rice 666-5555

    638, 1 31 Adams 111-2222

    640, 1 40 Smith 222-3333

    RowIDCust Name Phone

    NUSI NUPI

    639, 1 77 Jones 777-6666

    778, 3 95 Peters 555-7777

    778, 7 56 Smith 555-7777

    915, 9 51 Marsh 888-2222

  • 8/10/2019 Teradata-Day2

    23/33

    Full Table Scans

    Every row of the table must be read.

    All AMPs scan their portion of the table in parallel.

    Fast and efficient on Teradata due to parallelism.

    Full table scans typically occur when either:

    An index is not used in the query

    An index is used in a non-equality test

    Cust_ID Cust_Name Cust_Phone

    USI NUPI

    Customer

    SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';

    SELECT * FROM Customer WHERE Cust_Name = 'Davis';

    SELECT * FROM Customer WHERE Cust_ID > 1000;

    Examples of Full Table Scans:

  • 8/10/2019 Teradata-Day2

    24/33

    Partitioned Primary Indexes (PPI)

    What is a Partitioned Primary Index or PPI?

    A new indexing mechanism in Teradata.

    Data rows can be grouped into partitions at the AMP level.

    What advantages does a PPI provide?

    Increases the available options to improve the performance of certain types ofqueries.

    Only the rows of the qualified partitions in a query need to be accessed - avoid fulltable scans.

    Types of Partition Primary Index :

    Range Based Partition and Case Based Partition.

    As always, data is distributed among AMPs and automatically placed

    within partitions.

    In a table defined with a PPI, each row is uniquely identified by its Row Key.

    Row Key = Partition # + Row Hash + Uniqueness Value

  • 8/10/2019 Teradata-Day2

    25/33

    Logical Example of NPPI versus PPI

    4 AMPs with

    Orders Table defined

    with PPI on O_Date.

    RH O_# O_Date RH O_# O_Date RH O_# O_Date RH O_# O_Date

    '35' 1007 02/09 '26' 1002 02/09 '24' 1004 02/09 '20' 1005 02/09

    '39' 1011 02/09 '36' 1012 02/09 '32' 1003 02/09 '43' 1010 02/09

    '14' 1001 02/09 '06' 1009 02/09 '04' 1008 02/09 '08' 1006 02/09

    '03' 1016 02/10

    '17' 1013 02/10

    '48' 1023 02/10

    '07' 1017 02/10

    '16' 1021 02/10

    '45' 1015 02/10

    '09' 1018 02/10

    '27' 1014 02/10

    '44' 1022 02/10

    '02' 1024 02/10

    '11' 1019 02/10

    '22' 1020 02/10

    '01' 1028 02/11

    '12' 1031 02/11

    '28' 1032 02/11

    '10' 1034 02/11

    '29' 1033 02/11

    '34' 1029 02/11

    '19' 1025 02/11

    '40' 1035 02/11

    '47' 1027 02/11

    '25' 1036 02/11

    '31' 1026 02/11

    '46' 1030 02/11

    '23' 1040 02/12

    '30' 1038 02/12

    '42' 1047 02/12

    '13' 1037 02/12

    '21' 1045 02/12

    '36' 1043 02/12

    '05' 1048 02/12

    '15' 1042 02/12

    '33' 1039 02/12

    '18' 1041 02/12

    '38' 1046 02/12

    '41' 1044 02/12

    SELECT

    WHERE O_Date

    BETWEEN '2002-11-01'

    AND '2002-

    11-30';

    4 AMPs with

    Orders Table definedwith NPPI.

    '01' 1028 02/11

    '12' 1031 02/11

    '28' 1032 02/11

    '10' 1034 02/11

    '29' 1033 02/11

    '34' 1029 02/11

    '19' 1025 02/11

    '40' 1035 02/11

    '47' 1027 02/11

    '25' 1036 02/11

    '31' 1026 02/11

    '46' 1030 02/11

    '03' 1016 02/10

    '17' 1013 02/10

    '48' 1023 02/10

    '07' 1017 02/10

    '16' 1021 02/10

    '45' 1015 02/10

    '09' 1018 02/10

    '27' 1014 02/10

    '44' 1022 02/10

    '02' 1024 02/10

    '11' 1019 02/10

    '22' 1020 02/10

    '14' 1001 02/09

    '35' 1007 02/09

    '39' 1011 02/09

    '06' 1009 02/09

    '26' 1002 02/09

    '36' 1012 02/09

    '04' 1008 02/09

    '24' 1004 02/09

    '32' 1003 02/09

    '08' 1006 02/09

    '20' 1005 02/09

    '43' 1010 02/09

    '23' 1040 02/12

    '30' 1038 02/12

    '42' 1047 02/12

    '13' 1037 02/12

    '21' 1045 02/12

    '36' 1043 02/12

    '05' 1048 02/12

    '15' 1042 02/12

    '33' 1039 02/12

    '18' 1041 02/12

    '38' 1046 02/12

    '41' 1044 02/12

    RH O_# O_Date RH O_# O_Date RH O_# O_Date RH O_# O_Date

  • 8/10/2019 Teradata-Day2

    26/33

    Partitioning with RANGE_N

    CREATE TABLE Sales

    ( store_id INTEGER NOT NULL,

    item_id INTEGER NOT NULL,

    sales_date DATE FORMAT 'YYYY-MM-DD',

    total_revenue DECIMAL(9,2),

    total_sold INTEGER,UNIQUE PRIMARY INDEX (store_id ,item_id ,sales_date)

    PARTITION BY RANGE_N (

    sales_date

    BETWEEN DATE '2003-01-01' AND DATE '2003-12-31'

    EACH INTERVAL '1' MONTH);

    Notes:

    Partition current sales table into daily partitions. Assume current sales table only has data for the first 3 months of 2003,

    but we have defined partitions for the entire year 2003.

    It is relatively easy to ALTER the table to extend the partitions for 2004.

    A UPI is allowed because the partitioning columns are part of the PI.

  • 8/10/2019 Teradata-Day2

    27/33

    Partitioning with CASE_N

    Notes:

    Partition the data based on total revenue for the products.

    The NO CASE and UNKNOWN options allow for total_revenue >=100,000 or unknownrevenue.

    A UPI isNOT allowed because the partitioning columns areNOT part of the PI.CREATE TABLE Sales_Revenue

    ( store_id INTEGER NOT NULL,

    item_id INTEGER NOT NULL,

    sales_date DATE FORMAT 'YYYY-MM-DD',

    total_revenue DECIMAL(9,2),

    total_sold INTEGER,)

    PRIMARY INDEX (store_id, item_id, sales_date)

    PARTITION BY CASE_N

    ( total_revenue < 2000 , total_revenue < 4000 ,

    total_revenue < 6000 , total_revenue < 8000 ,

    total_revenue < 10000 , total_revenue < 20000 ,

    total_revenue < 50000 , total_revenue < 100000 ,

    NO CASE,

    UNKNOWN);

  • 8/10/2019 Teradata-Day2

    28/33

    Join Index :

  • 8/10/2019 Teradata-Day2

    29/33

  • 8/10/2019 Teradata-Day2

    30/33

  • 8/10/2019 Teradata-Day2

    31/33

  • 8/10/2019 Teradata-Day2

    32/33

  • 8/10/2019 Teradata-Day2

    33/33