45
27th June 2013 http://www.teradatamagazine.com/v09n02/tech2tech/applied-solutions-3-whats-a-dba- to-do/ [http://www.teradatamagazine.com/v09n02/tech2tech/applied-solutions-3-whats-a-dba-to-do/] A DATA WAREHOUSE FROM TERADATA ELIMINATES MANY MANUAL TASKS TYPICAL OF OTHER SYSTEMS. BY ROGER MANN What do experienced DBAs with an Oracle, IBM or Microsoft background need to know about managing a Teradata system? Basically, much less than they need to know about the others. The Teradata Database is a shared-nothing massively parallel processing (MPP) relational database management system (RDBMS), making it the only commercially available RDBMS designed from the ground up for data warehousing. Parallel processing and the automation of many typical DBA functions were created in the DNA of the Teradata Database. Because of that architecture, many functions that are manual or enhanced by wizards in other vendors’ systems are managed automatically. Consequently, the roles and responsibilities of the DBA are significantly different. Fewer tasks are required, making the system much easier to manage. Understanding those differences and how to exploit them with the RDBMS is key to driving the success of the organization. The next sections break down how a data warehouse from Teradata differs from other systems, enabling the DBA to focus on more productive work and less on manually maintaining the data warehouse. Data warehouse performance is achieved in a parallel database architecture by the “divide and conquer” method. First, the data is divided into small, equal units. Then independent software modules process those units simultaneously (i.e., in parallel) to conquer the problems (i.e., answer the queries). The units of work and the hardware resources allocated to each parallel software module must be as equal as possible. Like a chain that is only as strong as its weakest link, the overall job cannot conclude until every unit has completed its processing. This “balanced processing” of workloads in the Teradata Database enables superior query performance. The DBA’s objective, therefore, is to install a balanced processing platform, operating system (OS), database management software and disk subsystems. In a typical environment, these steps require careful planning with time-consuming analysis of user data and targeted queries. Anxious for a quick return on investment (ROI), however, management too often applies pressure to take shortcuts—which can have disastrous consequences. With a Teradata purpose-built platform, the OS, database and disk subsystems are installed before system delivery. All that is needed is the size of the raw user data, number of concurrent users and some targeted queries. This information is inputted into a system-sizing calculator and a platform configuration What’s a DBA to do? SYSTEM INSTALLATION Classic Dynamic Views template. Powered by Blogger . TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08... 1 of 45 8/16/2014 12:40 AM

DBA Roales Teradata

Embed Size (px)

DESCRIPTION

DBA Roales Teradata

Citation preview

  • 27th June 2013http://www.teradatamagazine.com/v09n02/tech2tech/applied-solutions-3-whats-a-dba-to-do/ [http://www.teradatamagazine.com/v09n02/tech2tech/applied-solutions-3-whats-a-dba-to-do/]

    A DATA WAREHOUSE FROM TERADATA ELIMINATES MANY MANUAL TASKS TYPICALOF OTHER SYSTEMS.

    BY ROGER MANN

    What do experienced DBAs with an Oracle, IBM or Microsoft background need to know about managing aTeradata system? Basically, much less than they need to know about the others.

    The Teradata Database is a shared-nothing massively parallel processing (MPP) relational databasemanagement system (RDBMS), making it the only commercially available RDBMS designed from theground up for data warehousing.

    Parallel processing and the automation of many typical DBA functions were created in the DNA of theTeradata Database. Because of that architecture, many functions that are manual or enhanced by wizardsin other vendors systems are managed automatically. Consequently, the roles and responsibilities of theDBA are significantly different. Fewer tasks are required, making the system much easier to manage.Understanding those differences and how to exploit them with the RDBMS is key to driving the success ofthe organization.

    The next sections break down how a data warehouse from Teradata differs from other systems, enablingthe DBA to focus on more productive work and less on manually maintaining the data warehouse.

    Data warehouse performance is achieved in a parallel database architecture by the divide and conquermethod. First, the data is divided into small, equal units. Then independent software modules processthose units simultaneously (i.e., in parallel) to conquer the problems (i.e., answer the queries).

    The units of work and the hardware resources allocated to each parallel software module must be asequal as possible. Like a chain that is only as strong as its weakest link, the overall job cannot concludeuntil every unit has completed its processing. This balanced processing of workloads in the TeradataDatabase enables superior query performance.

    The DBAs objective, therefore, is to install a balanced processing platform, operating system (OS),database management software and disk subsystems. In a typical environment, these steps requirecareful planning with time-consuming analysis of user data and targeted queries. Anxious for a quickreturn on investment (ROI), however, management too often applies pressure to take shortcutswhichcan have disastrous consequences.

    With a Teradata purpose-built platform, the OS, database and disk subsystems are installed beforesystem delivery. All that is needed is the size of the raw user data, number of concurrent users and sometargeted queries. This information is inputted into a system-sizing calculator and a platform configuration

    Whats a DBA to do?

    SYSTEM INSTALLATION

    Classic

    Dynamic Views template. Powered by Blogger.

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    1 of 45 8/16/2014 12:40 AM

  • [http://www.teradatamagazine.com/tdmo_assets/tdmo_images/table.png]

    Click to enlarge

    [http://www.teradatamagazine.com/tdmo_assets/tdmo_images/chart.png]

    are free from the burden of having to understand and be responsible for setting the various options andinitialization parameters with both the database and OS.

    Those runtime database control parameter settings are critical for performance tuning of transactionprocessing applications where queries are predetermined and must be tuned. However, a datawarehouse is built on the concept that users are able to ask any question of any data at any time. Thereis no opportunity to know or tune the queries beforehand. This is best left to the Teradata Database tooptimize and tune dynamically.

    In short, once the Teradata platform is installed, the DBA can immediately define the databases, usersand tables and load data. Users can then leverage the data warehouse as it is intendedto run queriesthat answer their business intelligence (BI) questions.

    Because the Teradata storage subsystem isinstalled and balanced before delivery,management of the disk subsystem is greatlysimplified. The DBA familiar with managing itemssuch as disk groups, logical volumes, nodegroups, file system, files and tablespaces willfind that those entities and concepts arenonexistent. (See table.)

    All disk organization is entirely logical, asopposed to physical. (See figure) Initially, allspace in the system is allocated to a predefinedsystem database called DBC. Using theCREATE DML command, the DBA will defineDATABASE and USER entities. The spaceparameter on the CREATE DML statement is nota physical allocation but is simply a size quota. If a database is allocated 5TB of space that is themaximum amount of space the database is allowed to use. Anytime that database attempts to use morethan 5TB, an out of space message will result. However, the system is not out of space, it just exceededits space quota.

    A database management system (DBMS) isdesigned for storing and retrieving data. Typicalfile-system architectures fragment andperformance degrades over time as insert,updates and deletes are applied to the data.

    Teradata broke all the rules with its file systemdesign. Data is not stored in B-Tree indexesbased on data values; rather, the file system isbuilt on raw disk slices. There are no pages,BufferPools, tablespaces, extents, etc. to

    STORAGE MANAGEMENT

    DISK FILE SYSTEM

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    2 of 45 8/16/2014 12:40 AM

  • the DBA never has to do a re-organization, and performance is optimized on a continual basis and doesnot degrade with file updates.

    Defining users is easy with a Teradata system. There are two types: query users with no workspacecapability; and power users who have the capability to create and manipulate tables within their ownworkspace based on whatever limitations the DBA placed on their space usage.

    The DBA first adds the users with a CREATE USER DML command, then grants them security rights todatabase entities. Role-based security is supported for ease of maintenance.

    DBAs need to resist the temptation to over-index the tables. Because of the powerful parallel architectureof the Teradata platform, it is unnecessary to avoid full-table scans. Therefore, far fewer indexes areneeded than in other RDBMSs. In fact, as experienced in several organizations, tables having more than80 billion rows each can be scanned in less than five minutes.

    These recommended steps will help determine the number of indexes needed in the Teradata Database:

    The DBA defines a primary index (PI) and a secondary index on any column that will participate as aforeign key in join operations. The PI is for data distribution and keyed access.

    Once the indexes are defined, the query workload is run and the query capture facility logs the queryactivity.

    The Teradata Index Wizard uses this information to recommend the addition or removal of indexesbased on actual query usage.

    This process saves the DBA from having to manually analyze the number of indexes.

    Special indexes are available for specific performance needs. The partitioned primary index (PPI), for one,can deliver dramatic results. A large transaction file that is accessed with date parameter queries is agood candidate for creating a PPI on transaction date. The Optimizer then can eliminate partitions on anydate-sensitive queries with dramatic response-time reductions.

    The multi-level PPI, join, aggregate and aggregate join indexes are tools that can turbo-charge certainapplications.

    Teradata Active System Management provides the necessary tools for comprehensive workloadmanagement; therefore, no outside tools or resources are needed. The product has three components:

    Dynamic Query Manager enables the DBA to classify and govern the query before its execution in thedatabase.

    Priority Scheduler defines resource partitions where varying workloads can be controlled and monitored.

    Database Query Log provides post-execution performance analysis.

    Because Teradata tools administer all performance management and tuning needs, the DBA no longerhas to be an expert in the OS, database and third-party tools.

    USER MANAGEMENT

    INDEX MANAGEMENT

    WORKLOAD MANAGEMENT

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    3 of 45 8/16/2014 12:40 AM

  • Expanding the Teradata system is similar to ordering the initial platform. With assistance from TeradataProfessional Services, the DBA determines the amount of raw data on the existing Teradata system andthe number of concurrent users, as well as a few critical queries. Then those numbers are added to theanticipated growth in each area. These values are input into a system-sizing calculator, which producesthe additional platform requirements.

    As with the original Teradata system, the additional platform and software will be pre-configured,delivered, installed and connected to the existing platform. The data redistribution utility is then run, whichautomatically rebalances the data on the system. The tool relocates any data that belongs on the newplatform nodes had it been installed at the time the data was loaded. Once that data is relocated, theutility tool removes it from the old nodes. (No data movement occurs between the original nodes.)Redistributing the data requires downtime on the system, but the process normally takes less than a shiftto complete.

    The features of the Teradata system make it ideal for data warehouse applications. The balanced,purpose-built platform arrives ready to deliver the first application generally in days, instead of weeks ormonths.

    With the automated data management features, DBAs are freed from having to micro-manage the filesystem and can, therefore, engage in other tasks and responsibilities. For instance, instead of the DBAconstantly writing and tuning queries, the query optimizer allows the user to ask any question, anytime.The support and freedom provided by a data warehouse from Teradata empowers DBAs to concentrateon working with the user community to deliver greater business value to their organization.

    Posted 27th June 2013 by pankaj agarwal

    A BETTER VALUE

    WHAT TERADATA DBAS DONT DO:

    With the automatic features included in a Teradata Database, DBAs have fewer tasks and responsibilities forimplementing and maintaining the system. As identified in this partial list of duties, Teradata DBAs have neverbeen required to:

    Install an operating system (OS)

    Understand and set extensive OS tuning parameters

    Install the Teradata Database

    Understand and set extensive Teradata Database parameters

    Write programs/execute utilities that determine how to divide data into partitions

    Determine size and physical location of each table and index partition or simple tablespace

    Code/allocate/format partitions or underlying file structures

    Embed partition assignment into CREATE TABLE statements

    Determine level/degrees of parallelism to be assigned to tables/partitions/databases

    Assign and manage special buffer pools for parallel processing

    Associate tables/queries with parallel degrees

    Code/allocate/format temporary work space

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    4 of 45 8/16/2014 12:40 AM

  • 25th March 2013

    Join IndexesThe join index JOIN the two tables together and keeps the result set in the permanentspace of TD.The join index will hold the result set of two table, and at the time of JOIN, parsing enginewill decide whether it is fast to build the result set from the actual BASE tables or the JOINindex.User never directly query the JOIN index.In the sense JOIN index is the result of joining two tables together so that parsing enginealways decide to take the result set from this JOIN index instead of going and doingmanual join on the base table.Types of JOIN index-

    1. Multi table Join Index- Suppose we have two BASE tables Employee and Dept, whichholds the data of employee and department respectively. Now a JOIN index on these twotables will be somewhat Create Join Index emp_dept asSelect empno, empname, emp_dept, emp_sal, emp_mgrFrom employee e inner join dept dOn e.emp_dept=d.deptnoUnique primary index (empno);This way the JOIN index EMP_DEPT holds the result set of two BASE tables and at thetime of JOIN, PE will decide whether it is faster to join actual tables or to take result setfrom this JOIN index. So always choose wise list of columns and tables to create JOINindex.

    2. Single table JOIN Index A Single table JOIN index duplicate a single table, but changesthe primary index. Users will only query the base table and its PE who decide which resultset is faster, from JOIN index or from actual BASE tables. The reason to create the singletable JOIN index is so joins can be performed faster because no redistribution orduplication needs to occur.

    Create Join Index emp_snap asSelect empno, empname, emp_deptFrom employeeprimary index (empdept);

    3. Aggregate JOIN Index An aggregate JOIN index will allow the tracking of averages SUMand COUNT on any table. This JOIN index is basically used if we need to perform anyaggregate function in the data of the table.Create Join Index AGG_TABLESelEmpno, sum(emp_sal)

    Join Indexes

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    5 of 45 8/16/2014 12:40 AM

  • 2. Users never query them directly, its PE who decide which result set to take.3. Updated when base tables are changed.4. Cant be loaded with fastload or multiload.

    Posted 25th March 2013 by pankaj agarwal

    0 Add a comment

    21st March 2013

    The very mention of changing data on disk implies that space must be managed bythe AMP(s) owning the row(s) to modify. Data cannot be changed unless it is readfrom the disk.

    For INSERT operations, a new block might be written or an existing block might bemodified to contain the new data row. The choice of which to use depends onwhether or not there is sufficient space on the disk to contain the original blockplus the number of bytes in the new row.

    If the new row causes the block to increase beyond the current number of sectors,the AMP must locate an empty slot with enough contiguous sectors to hold thelarger block. Then, it can allocate this new area for the larger block.

    A DELETE is going to make one or more blocks shorter. Therefore, it should neverhave to find a larger slot in which to write the block back to disk. However, it stillhas to read the existing block, remove the appropriate rows and re-write thesmaller block.

    The UPDATE is more unpredictable than either the DELETE or the INSERT. This isbecause an UPDATE might increase the size of the block like the INSERT, decreasethe size like the DELETE or not change the size at all.

    A larger block might occur because one of the following conditions:

    A NULL value was compressed and now must be expanded to contain avalue. This is the most likely situation .

    A longer character literal is stored into a VARCHAR column.

    Performance Issues With DataMaintenance

    Performance Issues With Data Maintenance

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    6 of 45 8/16/2014 12:40 AM

  • A block size does not change:

    The column is a fixed length CHAR, regardless of the length of theactual character data value, the length stays at the maximum defined.

    All numeric columns are stored in their maximum number of bytes.

    There are many reasons for performance gains or losses. Another consideration,which was previously mentioned, is the journal entries for the Transient Journalfor recovery and rollback processing. The Transient Journal is mandatory andcannot be disabled. Without it, data integrity cannot be guaranteed.

    When using FALLBACK on tables, it negatively impacts the processing time whenchanging rows within a table. This is due to the fact that the same change mustalso be made on the AMP storing the FALLBACK copy of the row(s) involved. Thesechanges involve additional disk I/O operations and the use of two AMPs instead ofone for each row INSERT, UPDATE, or DELETE. That equates to twice as much I/Oactivity.

    When using PERMANENT JOURNAL logging on tables, it will negatively impact theprocessing time when changing rows within a table. This is due to the fact that theUPDATE processing also inserts a copy of the row into the journal table. If BEFOREjournals are used, a copy of the row as it existed before a change is placed intothe log table. When AFTER images are requested, a copy of the row is insertedinto the journal table that looks exactly like the changed row.

    There is another issue to consider for journaling, based on SINGLE or DUALjournaling. DUAL asks for a second (mirror) copy to be inserted. It is the journalsway to provide FALLBACK copies without the table being required to useFALLBACK. The caution here is that if the TABLE is FALLBACK protected, so are thejournals. This will further impact the performance of the row modification.

    In Teradata, all tables must have a Primary Index (PI). It is a normal and veryimportant part of the storage and retrieval of rows for all tables. Therefore, thereis no additional overhead processing involved in an INSERT or DELETE operationfor Primary Indices.

    However, when using an UPDATE and the data value of a PI is changed, there ismore processing required than when changing the content of any other column.This is due to the fact that the original row must be read, literally deleted from thecurrent AMP and rehashed, redistributed and inserted on another AMP based on

    Impact of FALLBACK on Row Modification

    Impact of PERMANENT JOURNAL Logging on Row Modification

    Impact of Primary Index on Row Modification

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    7 of 45 8/16/2014 12:40 AM

  • successfully complete the operation when a PI is the column being modified.

    In Teradata, a Secondary Index is optional. Currently, a table may have 32secondary indices. Each index may be a combination of up 16 columns within atable. Every unique data value in a defined index has a row in the subtable andpotentially one on each AMP for a NUSI (Non Unique Secondary Index).Additionally, every index has its own subtable.

    When using secondary indices on tables, it may also negatively impact theprocessing time when changing rows within a table. This is due to the fact thatwhen a column is part of an index and its data value is changed in the base table,the index value must also be changed in the subtable. This normally requires thata row be read, deleted and inserted into a subtable when the column is involved ina USI (Unique Secondary Index). Remember that the delete and insert areprobably be on different AMP processors.

    For a NUSI, the processing all takes place on the same AMP. This is referred to asAMP Local. At first glance this sounds like a good thing. However, the processingrequires a read of the old NUSI, a modification, and a rewrite. Then, most likely itwill be necessary to insert an index row into the subtable. However, if the NUSIalready exists, Teradata needs to read the existing NUSI, append the new datavalue to it and re-write it back into the subtable. This is why it is important not tocreate a Primary Index or a Secondary Index on data that often changes.

    The point of this discussion is simple. If secondary indices are used, additionalprocessing is involved when the data value of the index is changed. This is true onan INSERT, a DELETE and an UPDATE. So, if a secondary index is defined, makesure that the SQL is using it to receive the potential access speed benefit. AnEXPLAIN can provide this information. If it is not being used, drop the index.

    As an added note to consider, when using composite secondary indices, the samecolumn can be included in multiple indices. When this is the case, any data valuechange requires multiple subtable changes. The result is that the number ofindices in which it is defined multiplies the previous AMP and subtable-processingoverhead. Therefore, it becomes more important to choose columns with a lowprobability of change.

    Posted 21st March 2013 by pankaj agarwal

    Impact of Secondary Indices on Row Modification

    0 Add a comment

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    8 of 45 8/16/2014 12:40 AM

  • SELECT * FROM DBC.DBCINFO;

    The access language for all modern relational database systems (RDBMS) isStructured Query Language (SQL). It has evolved over time to be the standard.The ANSI SQL group defines which commands and functionality all vendors shouldprovide within their RDBMS.

    There are three levels of compliance within the standard: Entry, Intermediate andFull. The three level definitions are based on specific commands, data types andfunctionalities. So, it is not that a vendor has incorporated some percentage of thecommands; it is more that each command is categorized as belonging to one ofthe three levels. For instance, most data types are Entry level compliant. Yet,there are some that fall into the Intermediate and Full definitions.

    Since the standard continues to grow with more options being added, it is difficultto stay fully ANSI compliant. Additionally, all RDBMS vendors provide extrafunctionality and options that are not part of the standard. These extra functionsare called extensions because they extend or offer a benefit beyond those in thestandard definition.

    At the writing of this book, Teradata was fully ANSI Entry level compliant based onthe 1992 Standards document. NCR also provides much of the Intermediate andsome of the Full capabilities. This book indicates feature by feature which SQLcapabilities are ANSI and which are Teradata specific, or extensions. It is to NCRsbenefit to be as compliant as possible in order to make it easier for customers ofother RDBMS vendors to port their data warehouse to Teradata.

    As indicated earlier, SQL is used to access, store, remove and modify data storedwithin a relational database, like Teradata. The SQL is actually comprised of threetypes of statements. They are: Data Definition Language (DDL), Data ControlLanguage (DCL) and Data Manipulation Language (DML). The primary focus of thisbook is on DML and DDL. Both DDL and DCL are, for the most part, used foradministering an RDBMS. Since the SELECT statement is used the vast majority ofthe time, we are concentrating on its functionality, variations and capabilities.

    Everything in the first part of this chapter describes ANSI standard capabilities ofthe SELECT command. As the statements become more involved, each capabilitywill be designated as either ANSI or a Teradata Extension.

    Using the SELECT has been described like playing the game, Jeopardy. The answer

    Determining the Release of Your Teradata System:

    Fundamental Structured Query Language (SQL)

    Basic SELECT Command

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    9 of 45 8/16/2014 12:40 AM

  • case of the statement is not important. The SQL statements can be written usingall uppercase, lowercase or a combination; it does not matter to the Teradata PE.

    The SELECT is used to return the data value(s) stored in the columns namedwithin the SELECT command. The requested columns must be valid names definedin the table(s) listed in the FROM portion of the SELECT.

    The following shows the format of a basic SELECT statement. In this book, thesyntax uses expressions like: (see Figure 1-1) to represent thelocation of one or more names required to construct a valid SQL statement:

    The structure of the above command places all keywords on the left in uppercaseand the variable information such as column and table names to the right. Likeusing capital letters, this positioning is to aid in learning SQL. Lastly, although theuse of SEL is acceptable in Teradata, with [ECT] in square brackets being optional,it is not ANSI standard.

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    10 of 45 8/16/2014 12:40 AM

  • SEL[ECT] FROM ;

    Both of these SELECT statements produce the output report, but the above style iseasier to read and debug for complex queries. The output display might appear as:

    3 Rows Returned

    aaaaaaaaaaaaaaaaaa

    bbbbbbbbbbbbbbbb

    cccccccccccccccccc

    In the output, the column name becomes the default heading for the report. Then,the data contained in the selected column is displayed once for each row returned.

    The next variation of the SELECT statement returns all of the columns defined inthe table indicated in the FROM portion of the SELECT.

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    11 of 45 8/16/2014 12:40 AM

  • The output of the above request uses each column name as the heading and thecolumns are displayed in the same sequence as they are defined in the table.Depending on the tool used to submit the request, care should be taken, becauseif the returned display is wider than the media (i.e. terminal=80 and paper=133);it may be truncated.

    At times, it is desirable to select the same column twice. This is permitted and toaccomplish it, the column name is simply listed in the SELECT column list morethan once. This technique might often be used when doing aggregations orcalculating a value, both are covered in later chapters.

    The table below is used to demonstrate the results of various requests. It is asmall table with a total of ten rows for easy comparison.

    Student Table - contains 10 students

    Student_ID Last_Name First_name Class_code Grade_Pt

    PK FKDynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    12 of 45 8/16/2014 12:40 AM

  • 234121

    231222

    260000

    280023

    322133

    324652

    333450

    423400

    Thomas

    Wilson

    Johnson

    McRoberts

    Bond

    Delaney

    Smith

    Larkins

    Wendy

    Susie

    Stanley

    Richard

    Jimmy

    Danny

    Andy

    Michael

    FR

    SO

    JR

    JR

    SR

    SO

    FR

    4.00

    3.80

    1.90

    3.95

    3.35

    2.00

    0.00

    Figure 2-1

    For Example: the next SELECT might be used with Figure 2-1, to display thestudent number, the last name, first name, the class code and grade point for allof the students in the Student table:

    SELECT *FROM Student_Table ;

    10 Rows returned

    Student_ID Last_Name First_Name Class_Code Grade_Pt

    423400 Larkins Michael FR 0.00

    125634 Hanson Henry FR 2.88

    280023 McRoberts Richard JR 1.90

    260000 Johnson Stanley ? ?

    231222 Wilson Susie SO 3.80

    234121 Thomas Wendy FR 4.00

    324652 Delaney Danny SR 3.35

    123250 Phillips Martin SR 3.00

    322133 Bond Jimmy JR 3.95

    333450 Smith Andy SO 2.00

    Notice that Johnson has question marks in the grade point and class codecolumns. Most client software uses the question mark to represent missing data oran unknown value (NULL). More discussion on this condition will appearthroughout this book. The other thing to note is that character data is alignedfrom left to right, the same as we read it and numeric is from right to left, fromthe decimal.

    This SELECT returns all of the columns except the Student ID from the StudentDynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    13 of 45 8/16/2014 12:40 AM

  • 10 Rows returned

    First_Name Last_Name Class_Code Grade_Pt

    Michael Larkins FR 0.00

    Henry Hanson FR 2.88

    Richard McRoberts JR 1.90

    Stanley Johnson ? ?

    Susie Wilson SO 3.80

    Wendy Thomas FR 4.00

    Danny Delaney SR 3.35

    Martin Phillips SR 3.00

    Jimmy Bond JR 3.95

    Andy Smith SO 2.00

    There is no short cut for selecting all columns except one or two. Also, notice thatthe columns are displayed in the output in the same sequence they are requestedin the SELECT statement.

    The previous unconstrained SELECT statement returned every row from thetable. Since the Teradata database is most often used as a data warehouse, a tablemight contain millions of rows. So, it is wise to request only certain types of rowsfor return.

    By adding a WHERE clause to the SELECT, a constraint is established to potentiallylimit which rows are returned based on a TRUE comparison to specific criteria orset of conditions.

    WHERE Clause

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    14 of 45 8/16/2014 12:40 AM

  • The conditional check in the WHERE can use the ANSI comparison operators(symbols are ANSI / alphabetic is Teradata Extension):

    Equal Not Equal Less Than Greater Than Less Than or Equal Greater Than or Equal

    = < > =

    EQ NE LT GT LE GE

    Figure 2-2

    The following SELECT can be used to return the students with a B (3.0) average orbetter from the Student table:

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    15 of 45 8/16/2014 12:40 AM

  • 5 Rows returned

    Student_ID Last_Name Grade_Pt

    231222 Wilson 3.80

    234121 Thomas 4.00

    324652 Delaney 3.35

    123250 Phillips 3.00

    322133 Bond 3.95

    Without the WHERE clause, the AMPs return all of the rows in the table to theuser. More and more Teradata user systems are getting to the point where theyare storing billions of rows in a single table. There must be a very good reason forneeding to see all of them. More simply put, you will always use a WHERE clausewhenever you want to see only a portion of the rows in a table.

    Compound Comparisons ( AND / OR )Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    16 of 45 8/16/2014 12:40 AM

  • The following is the syntax for using the AND logical operator:

    Notice that the column name is listed for each comparison separated by a logicaloperator; this will be true even when it is the same column being compared twice.The AND signifies that each individual comparison on both sides of the AND mustbe true. The final result of the comparison must be TRUE for a row to be returned.

    This Truth Table illustrates this point using AND.

    First Test Result AND Second Test Result Final Result

    True True True

    True False False

    False True FalseDynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    17 of 45 8/16/2014 12:40 AM

  • never contain more than a single data value.

    Therefore, it does not make good sense to issue the next SELECT using an AND onthe same column because no rows will ever be returned.

    No rows found

    The above SELECT will never return any rows. It is impossible for a column tocontain more than one value. No student has a 3.0 grade average AND a 4.0average. They might have one or the other, but not both. It might contain one orthe other, but never

    both at the same time. The AND operator indicates both must be TRUE and shouldnever be used between two comparisons on the same column.

    By substituting an OR logical operator for the previous AND, rows will now bereturned.

    The following is the syntax for using OR:

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    18 of 45 8/16/2014 12:40 AM

  • 2 Rows returned

    Student_ID Last_Name First_Name Grade_Pt

    234121 Thomas Wendy 4.00

    123250 Phillips Martin 3.00

    The OR signifies that only one of the comparisons on each side of the OR needs tobe true for the entire test to result in a true and the row to be selected.

    This Truth Table illustrates the results for the OR:

    First Test Result OR Second Test Result Final Result

    True True True

    True False True

    False True True

    False False False

    Figure 2-4

    When using the OR, the same column or different column names may be used. Inthis case, it makes sense to use the same column because a row is returned whena column contains either of the specified values as opposed to both values as seenwith AND.

    It is perfectly legal and common practice to combine the AND with the OR in asingle SELECT statement.

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    19 of 45 8/16/2014 12:40 AM

  • 2 Rows returned

    Student_ID Last_Name First_Name Class_Code Grade_Pt

    234121 Thomas Wendy FR 4.00

    123250 Phillips Martin SR 3.00

    At first glance, it appears that the comparison worked correctly. However, uponcloser evaluation it is incorrect because Phillips is a senior and not a freshman.

    When mixing AND with OR in the same WHERE clause, it is important to knowthat the AND is evaluated first. The previous SELECT actually returns all rows witha grade point of 3.0. Hence, Phillips was returned. The second comparisonreturned Thomas with a grade point of 4.0 and a class code of FR.

    When it is necessary for the OR to be evaluated before the AND the use ofparentheses changes the priority of evaluation. A different result is seen whendoing the OR first. Here is how the statement should be written:

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    20 of 45 8/16/2014 12:40 AM

  • 1 Row returned

    Last_Name Class_Code Grade_Pt

    Thomas FR 4.00

    Now, only Thomas is returned and the output is correct.

    NULL is an SQL reserved word. It represents missing or unknown data in acolumn. Since NULL is an unknown value, a normal comparison cannot be used todetermine whether it is true or false. All comparisons of any value to a NULLresult in an unknown; it is neither true nor false. The only valid test for a nulluses the keyword NULL without the normal comparison symbols and is explainedin this chapter.

    When a table is created in Teradata, the default for a column is for it to allow aNULL value to be stored. So, unless the default is over-ridden and NULL values arenot allowed, it is a good idea to understand how they work.

    A SHOW TABLE command (chapter 3) can be used to determine whether a NULL isallowed. If the column contains a NOT NULL constraint, you need not be concernedabout the presence of a NULL because it is disallowed.

    This AND Truth Table must now be used for compound tests when NULL values areallowed:

    First Test Result AND Second Test Result Final Result

    Impact of NULL on Compound Comparisons

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    21 of 45 8/16/2014 12:40 AM

  • Unknown False False

    Unknown Unknown Unknown

    Figure 2-5

    This OR Truth Table must now be used for compound tests when NULL values areallowed:

    First Test Result OR Second Test Result Final Result

    True Unknown True

    Unknown True True

    False Unknown Unknown

    Unknown False Unknown

    Unknown Unknown Unknown

    Figure 2-6

    For most comparisons, an unknown (null) is functionally equivalent to a falsebecause it is not a true. Therefore, when using any comparison symbol a row isnot returned when it contains a NULL.

    At the same time, the next SELECT does not return Johnson because allcomparisons against a NULL are unknown:

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    22 of 45 8/16/2014 12:40 AM

  • No rows foundV2R5: *** Failure 3731 The user must use IS NULL or IS NOT NULL to test forNULL values.

    As seen in the above Truth tables, a comparison test cannot be used to find aNULL.

    To find a NULL, it becomes necessary to make a slight change in the syntax of theconditional comparison. The coding necessary to find a NULL is seen in the nextsection.

    It can be fairly straightforward to request exactly which rows are needed.However, sometimes rows are needed that contain any value other than a specificvalue. When this is the case, it might be easier to write the SELECT to find what isnot needed instead of what is needed. Then convert it to return everything else.This might be the situation when there are 100 potential values stored in thedatabase table and 99 of them are needed. So, it is easier to eliminate the onevalue than it is to specifically list the desired 99 different values individually.

    Either of the next two SELECT formats can be used to accomplish the eliminationof the one value:

    Using NOT in SQL Comparisons

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    23 of 45 8/16/2014 12:40 AM

  • This second version of the SELECT is normally used when compound conditionsare required. This is because it is usually easier to code the SELECT to get what isnot wanted and then to enclose the entire set of comparisons in parentheses andput one NOT in front of it. Otherwise, with a single comparison, it is easier to putNOT in front of the comparison operator without requiring the use of parentheses.

    The next SELECT uses the NOT with an AND comparison to display seniors andlower classmen with grade points less than 3.0:

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    24 of 45 8/16/2014 12:40 AM

  • 6 Rows returned

    Last_Name First_Name Class_Code Grade_Pt

    McRoberts Richard JR 1.90

    Hanson Henry FR 2.88

    Delaney Danny SR 3.35

    Larkins Michael FR 0.00

    Phillips Martin SR 3.00

    Smith Andy SO 2.00

    Without using the above technique of a single NOT, it is necessary to change everyindividual comparison. The following SELECT shows this approach, notice the otherchange necessary below, NOT AND is an OR:

    Since you cannot have conditions like: NOT >= and NOT , they must beconverted to < (not < and not =) and = (not, not =). It returns the same 5 rows,but also notice that the AND is now an OR:

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    25 of 45 8/16/2014 12:40 AM

  • 6 Rows returned

    Last_Name First_Name Class_Code Grade_Pt

    McRoberts Richard JR 1.90

    Hanson Henry FR 2.88

    Delaney Danny SR 3.35

    Phillips Martin SR 3.00

    Larkins Michael FR 0.00

    Smith Andy SO 2.00

    Chart of individual conditions and NOT:

    Condition Opposite condition NOT condition

    =

    = NOT

    AND OR OR

    OR AND AND

    Figure 2-7

    To maintain the integrity of the statement, all portions of the WHERE must bechanged, including AND, as well as OR. The following two SELECT statementsillustrate the same concept when using an OR:Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    26 of 45 8/16/2014 12:40 AM

  • 1 Row returned

    Last_Name

    Hanson

    In the earlier Truth table, the NULL value returned an unknown when checkedwith a comparison operator. When looking for specific conditions, an unknown wasfunctionally equivalent to a false, but really it is an unknown.

    These two Truth tables can be used together as a tool when mixing AND and ORtogether in the WHERE clause along with NOT.

    This Truth Table helps to gauge returned rows when using NOT with AND:

    First Test Result AND Second Test Result Result

    NOT(True) = False NOT(Unknown) = Unknown False

    NOT(Unknown) = Unknown NOT(True) = False False

    NOT(False) = True NOT(Unknown) = Unknown Unknown

    NOT(Unknown) = Unknown NOT(False) = True Unknown

    NOT(Unknown) = Unknown NOT(Unknown) = Unknown Unknown

    Figure 2-8Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    27 of 45 8/16/2014 12:40 AM

  • NOT(Unknown) = Unknown NOT(True) = False Unknown

    NOT(False) = True NOT(Unknown) = Unknown True

    NOT(Unknown) = Unknown NOT(False) = True True

    NOT(Unknown) = Unknown NOT(Unknown) = Unknown Unknown

    Figure 2-9

    There is an issue associated with using NOT. When a NOT is done on a truecondition, the result is a false. Likewise, the NOT of a false is a true. However,when a NOT is done with an unknown, the result is still an unknown. Whenever aNULL appears in the data for any of the columns being compared, the row willnever be returned and the answer set will not be what is expected.

    Another area where care must be taken is when allowing NULL values to be storedin one or both of the columns. As mentioned earlier, previous versions of Teradatahad no concept of unknown and if a compare didnt result in a true, it was false.With the emphasis on ANSI compatibility the unknown was introduced.

    If NULL values are allowed and there is potential for the NULL to impact the finaloutcome of compound tests, additional tests are required to eliminate them. Oneway to eliminate this concern is to never allow a NULL value in any columns.However, this may not be appropriate and it will require more storage spacebecause a NULL can be compressed. Therefore, when a NULL is allowed, the SQLneeds to simply check for a NULL.

    Therefore, using the expression IS NOT NULL is a good technique when NULL isallowed in a column and the NOT is used with a single or a compound comparison.This does require another comparison and could be written as:

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    28 of 45 8/16/2014 12:40 AM

  • 7 Rows returned

    Last_Name First_Name Class_Code Grade_Pt

    Larkins Michael FR 0.00

    Hanson Henry FR 2.88

    McRoberts Richard R 1.90

    Johnson Stanley ? ?

    Delaney Danny SR 3.35

    Phillips Martin SR 3.00

    Smith Andy SO 2.00

    Notice that Johnson came back this time and did not appear previously because ofthe NULL values.

    Later in this book, the COALESCE will be explored as another way to eliminateNULL values directly in the SQL instead of in the database.

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    29 of 45 8/16/2014 12:40 AM

  • 21st March 2013

    Although a relational data model uses Primary Keys and Foreign Keys to establishthe relationships between tables, that design is a Logical Model. Each vendor usesspecialized techniques to implement a Physical Model. Teradata does not use keysin its physical model. Instead, Teradata is implemented using indices, both primaryand secondary.

    The Primary Index (PI) is the most important index in all of Teradata. Theperformance of Teradata can be linked directly to the selection of this index. Thedata value in the PI column(s) is submitted to the hashing function. The resultingrow hash value is used to map the row to a specific AMP for data distribution andstorage.

    To illustrate this concept, I have on several occasions used two decks of cards.Imagine if you will, fourteen people in a room. To the largest, most powerfullooking man in the room, you give one of the decks of cards. His large hands allowhim to hold all fifty-two cards at one time, with some degree of success. The cardsare arranged with the ace of spades continuing through the king of spades inascending order. After the spades, are the hearts, then the clubs and last, thediamonds. Each suit is arranged starting with the ace and ascending up to theking. The cards are partitioned by suit.

    The other deck of cards is divided among the other thirteen people. Using thisprocedure, all cards with the same value (i.e. aces) all go to the same person.Likewise, all the deuces, treys and subsequent cards each go to one of the thirteenpeople. Each of the four cards will be in the same order as the suits contained inthe single deck that went to the lone man: spades, hearts, clubs and diamonds.Once all the cards have been distributed, each of the thirteen people will beholding four cards of the same value (4*13=52). Now, the game can begin.

    The requests in this game come in the form of give-me, one or more cards.

    To make it easy for the lone player, we first request: give-me the ace of spades.The person with four aces finds their ace, as does the lone player with all 52cards, both on the top other their cards. That was easy!

    As the difficulty of the give-me requests increase, the level of difficultydramatically increases for the lone man. For instance, when the give-me requestis for all of the twos, one of the thirteen people holds up all four of their cards and

    Use of an Index

    Use of an Index

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    30 of 45 8/16/2014 12:40 AM

  • Another request might be give-me all of the diamonds. For the thirteen people,each person locates and holds up one card of their cards and the request isfinished. For the lone person with the single deck, the request means finding andholding up the last thirteen cards in their deck of fifty-two. In each of thesegive-me requests, the lone man had to negotiate all fifty two cards while thethirteen other people only needed to determine which of the four cards applied tothe request, if any. This is the same procedure used by Teradata. It divides up thedata like we divided up the cards.

    As illustrated, the thirteen people are faster than the lone man. However, thegame is not limited to thirteen players. If there were 26 people who wished toplay on the same team, the cards simply need to be divided or distributeddifferently.

    When using the value (ace through king) there are only 13 unique values. Inorder for 26 people to play, we need a way to come up with 26 unique values for26 people. To make the cards more unique, we might combine the value of thecard (i.e. ace) with the color. Therefore, we have two red aces and two black acesas well as two sets for every other card. Now when we distribute the cards, eachof the twenty-six people receives only two cards instead of the original four. Thedistribution is still based on fifty-two cards (2 times 26).

    At the same time, the optimum number of people for the game is not 26. Based onwhat has been discussed so far, what is the optimum number of people?

    If your answer is 52, then you are absolutely correct.

    With this many people, each person has one and only one card. Any time agive-me is requested of the participants, their one card either qualifies or it doesnot. It doesnt get any simpler or faster than this situation.

    As easy as this sounds, to accomplish this distribution the value of the card aloneis not sufficient to manifest 52 unique values. Neither is using the value and thecolor. That combination only gives us a distribution of 26 unique values when 52unique values are desired.

    To achieve this distribution we need to establish still more uniqueness.Fortunately, we can use the suit along with the value. Therefore, the ace of spadesis different than the ace of hearts, which is different from the ace of clubs and theace of diamonds. In other words, there are now 52 unique identities to use fordistribution.

    To relate this distribution to Teradata, one or more columns of a table are chosento be the Primary Index.

    Primary Index

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    31 of 45 8/16/2014 12:40 AM

  • To store the data, the value(s) in the PI are hashed via a calculation to determinewhich AMP will own the data. The same data values always hash the same rowhash and therefore are always associated with the same AMP.

    The advantage to using up to sixteen columns is that row distribution is verysmooth or evenly based on unique values. This simply means that eachAMPcontains the same number of rows. At the same time, there is a downside tousing several columns for a PI. The PE needs every data value for each column asinput to the hashing calculation to directly access a particular row. If a singlecolumn value is missing, a full table scan will result because the row hash cannotbe recreated. Any row retrieval using the PI column(s) is always an efficient, oneAMP operation.

    Although uniqueness is good in most cases, Teradata does not require that a UPIbeused. It also allows for a Non-Unique Primary Index (NUPI, pronounced asnew-pea). The potential downside of a NUPI is that if several duplicate values(NUPI dups) are stored, they all go to the same AMP. This can cause an unevendistribution that places more rows on some of the AMPs than on others. Thismeans that any time an AMP with a larger number of rows is involved, it has towork harder than the other AMPs. The other AMPs will finish before the slowerAMP. The time to process a single user request is always based on the slowestAMP. Therefore, serious consideration should be used when making the decision touse a NUPI.

    Every table must have a PI and it is established when the table is created. If theCREATE TABLE statement contains: UNIQUE PRIMARY INDEX( ), thevalue in the column(s) will be distributed to an AMP as a UPI. However, if thestatement reads: PRIMARY INDEX ( ), the value in the column(s)will be distributed as a NUPI and allow duplicate values. Again, all the same valueswill go to the same AMP.

    If the DDL statement does not specify a PI, but it specifies a PRIMARY KEY (PK),the named column(s) are used as the UPI. Although Teradata does not useprimary keys, the DDL may be ported from another vendor's database system.

    A UPI is used because a primary key must be unique and cannot be null. Bydefault, both UPIs and NUPIs allow a null value to be stored unless the columndefinition indicates that null values are not allowed using a NOT NULL constraint.

    Now, with that being said, when considering JOIN accesses on the tables,sometimes it is advantageous to use a NUPI. This is because the rows being joinedbetween tables must be on the same AMP. If they are not on the same AMP, one ofthe rows must be moved to the same AMP as the matching row. Teradata will useone of two different strategies to temporarily move rows. It can copy all neededrows to all AMPs or it can redistribute them using the hashing mechanism on thecolumn defined as the join domain that is a PI. However, if neither join column is aPI, it might be necessary to redistribute all participating rows from both tables byDynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    32 of 45 8/16/2014 12:40 AM

  • duplicate values. The logical data model needs to be extended with usageinformation in order to know the best way to distribute the data rows. This is doneduring the physical implementation phase before creating tables.

    A Secondary Index (SI) is used in Teradata as a way to directly access rows in thedata, sometimes called the base table, without requiring the use of PI values.Unlike the PI, an SI does not effect the distribution of the data rows. Instead, it isan alternate read path and allows for a method to locate the PI value using the SI.Once the PI is obtained, the row can be directly accessed using the PI. Like the PI,an SI can consist of up to 16 columns.

    In order for an SI to retrieve the data row by way of the PI, it must store andretrieve an index row. To accomplish this Teradata creates, maintains and uses asubtable. The PI of the subtable is the value in the column(s) that are defined asthe SI. The data stored in the subtable row is the previously hashed value of thereal PI for the data row or rows in the base table. The SI is a pointer to the realdata row desired by the request. An SI can also be unique (USI, pronounced asyou-sea) or non-unique (NUSI, pronounced as new-sea).

    The rows of the subtable contain the row hashed value of the SI, the actual datavalue(s) of the SI, and the row hashed value of the PI as the row ID. Once the rowID of the PI is obtained from the subtable row, using the hashed value of the SI,the last step is to get the actual data row from the AMP where it is stored. Theaction and hashing for an SI is exactly the same as when starting with a PI. Whenusing a USI, the access of the subtable is a one AMP operation and then accessingthe data row from the base table is another one AMP operation. Therefore, USIaccesses are always a two AMP operation based on two separate row hashoperations.

    When using a NUSI, the subtable access is always an all AMP operation. Since thedata is distributed by the PI, NUSI duplicate values may exist and probably doexist on multiple AMPs. So, the best plan is to go to all AMPs and check for therequested NUSI value. To make this more efficient, each AMP scans its subtable.These subtable rows contain the row hash of the NUSI, the value of the data thatcreated the NUSI and one or more row IDs for all the PI rows on that AMP. This isstill a fast operation because these rows are quite small and several are stored ina single block. If the AMP determines that it contains no rows for the value of theNUSI requested, it is finished with its portion of the request. However, if an AMPhas one or more rows with the NUSI value requested, it then goes and retrievesthe data rows into spool space using the index.

    With this said, the SQL optimizer may decide that there are too many base tabledata rows to make index access efficient. When this happens, the AMPs will do afull base table scan to locate the data rows and ignore the NUSI. This situation iscalled a weakly selective NUSI. Even using old-fashioned indexed sequential files,

    Secondary Index

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    33 of 45 8/16/2014 12:40 AM

  • If the SQL does not use a NUSI, you should consider dropping it, due to the factthat the subtable takes up PERM space with no benefit to the users. The TeradataEXPLAIN is covered in this book and it is the easiest way to determine if your SQLis using a NUSI. Furthermore, the optimizer will never use a NUSI withoutSTATISTICS.

    There has been another evolution in the use of NUSI processing. It is called NUSIBitmapping. This means that if a table has two different NUSI indices andindividually they are weakly selective, but together they can be bitmappedtogether to eliminate most of the non-conforming rows; it will use the twodifferent NUSI columns together because they become highly selective. Therefore,many times, it is better to use smaller individual NUSI indices instead of a largecomposite (more than one column) NUSI.

    There is another feature related to NUSI processing that can improve access timewhen a value range comparison is requested. When using hash values, it isimpossible to determine any value within the range. This is because large datavalues can generate small hash values and small data values can produce largehash values. So, to overcome the issue associated with a hashed value, there is arange feature called Value Ordered NUSIs. At this time, it may only be used with afour byte or smaller numeric data column. Based on its functionality, a ValueOrdered NUSI is perfect for date processing. See the DDL chapter in this book formore details on USI and NUSI usage.

    Posted 21st March 2013 by pankaj agarwal

    0 Add a comment

    21st March 2013

    In Teradata, a user is the same as a database with one exception. A user is able tologon to the system and a database cannot. Therefore, to authenticate the user, apassword must be established. The password is normally established at the sametime that the CREATE USER statement is executed. The password can also bechanged using a MODIFY USER command.

    Like a database, a user area can contain database objects (tables, views, macrosand triggers). A user can have PERM and TEMP space and can also have spoolspace. On the other hand, a user might not have any of these types of space,exactly the same as a database.

    Teradata Users

    Teradata Users

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    34 of 45 8/16/2014 12:40 AM

  • PERMANENT

    TEMPORARY

    SPOOL

    ACCOUNT

    FALLBACK

    JOURNAL

    DEFAULT JOURNAL

    PASSWORD

    STARTUP

    DEFAULT DATABASE

    By no means are these all of the parameters. It is not the intent of this chapter,nor the intent of this book to teach database administration. There are referencemanuals and courses available to use. Teradata administration warrants a book byitself.

    http://www.coffingdw.com/sql/tdsqlutp/teradata_users.htm[http://www.coffingdw.com/sql/tdsqlutp/teradata_users.htm]

    Posted 21st March 2013 by pankaj agarwal

    { CREATE | MODIFY } DATABASE or USER (in common)

    { CREATE | MODIFY } USER (only)

    0 Add a comment

    21st March 2013

    Within Teradata, a database is a storage location for database objects (tables,views, macros, and triggers). An administrator can use Data DefinitionLanguage(DDL) to establish a database by using a CREATE DATABASE command.

    A database may have PERMANENT (PERM) space allocated to it. This PERM space

    A Teradata Database

    A Teradata Database

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    35 of 45 8/16/2014 12:40 AM

  • Teradata allocates PERM space to tables, up to the maximum, as rows areinserted. The space is not pre-allocated. Instead, it is allocated, as rows are storedin blocks on disk. The maximum block size is defined either at a system level inthe DBS Control Record, at the database level or individually for each table. LikePERM, the block size is a maximum size. Yet, it is only a maximum for blocks thatcontain multiple rows. By nature, the blocks are variable in length. So, disk spaceis not pre-allocated; instead, it is allocated on an as needed basis, one sector (512bytes) at a time. Therefore, the largest possible wasted disk space in a block is511 bytes.

    A database can also have SPOOL space associated with it. All users who runqueries need workspace at some point in time. This SPOOL space is workspaceused for the temporary storage of rows during the execution of user SQLstatements. Like PERM space, SPOOL is defined as a maximum amount that can beused within a database or by a user. Since PERM is not pre-allocated, unusedPERM space is automatically available for use as SPOOL. This maximizes the diskspace throughout the system.

    It is a common practice in Teradata to have some databases with PERM space thatcontain only tables. Then, other databases contain only views. These viewdatabases require no PERM space and are the only databases that users haveprivileges to access. The views in these databases control all access to the realtables in other databases. They insulate the actual tables from user access. Therewill be more on views later in this book.

    The newest type of space allocation within Teradata is TEMPORARY (TEMP) space.A database may or may not have TEMP space, however, it is required if GlobalTemporary Tables are used. The use of temporary tables is also covered in moredetail later in the SQL portion of this book.

    A database is defined using a series of parameter values at creation time. Themajority of the parameters can easily be changed after a database has beencreated using the MODIFY DATABASE command. However, when attempting toincrease PERM or TEMP space maximums, there must be sufficient disk spaceavailable even though it is not immediately allocated. There may not be morePERM space defined that actual disk on the system.

    A number of additional database parameters are listed below along with the userparameters in the next section. These parameters are tools for the databaseadministrator and other experienced users when establishing databases for tablesand views.

    PERMANENT

    TEMPORARY

    CREATE / MODIFY DATABASE Parameters

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    36 of 45 8/16/2014 12:40 AM

  • JOURNAL

    DEFAULT JOURNAL

    Posted 21st March 2013 by pankaj agarwal

    0 Add a comment

    21st March 2013

    The Teradata database currently runs normally on NCR Corporations WorldMarkSystems in the UNIX MP-RAS environment. Some of these systems consist of asingle processing node (computer) while others are several hundred nodesworking together in a single system. The NCR nodes are based entirely onindustry standard CPU processor chips, standard internal and external busarchitectures like PCI and SCSI, and standard memory modules with 4-wayinterleaving for speed.

    At the same time, Teradata can run on any hardware server in the single nodeenvironment when the system runs Microsoft NT and Windows 2000. This singlenode may be any computer from a large server to a laptop.

    Whether the system consists of a single node or is a massively parallel systemwith hundreds of nodes, the Teradata RDBMS uses the exact same componentsexecuting on all the nodes in parallel. The only difference between small and largesystems is the number of processing components.

    When these components exist on different nodes, it is essential that thecomponents communicate with each other at high speed. To facilitate thecommunications, the multi-node systems use the BYNET interconnect. It is a highspeed, multi-path, dual redundant communications channel. Another amazingcapability of the BYNET is that the bandwidth increases with each consecutivenode added into the system. There is more detail on the BYNET later in thischapter.

    As previously mentioned, Teradata is the superior product today because of itsparallel operations based on its architectural design. It is the parallel processingby the major components that provide the power to move mountains of data.Teradata works more like the early Egyptians who built the pyramids withoutheavy equipment using parallel, coordinated human efforts. It uses smaller nodes

    Teradata Architecture 1

    Teradata Architecture

    Teradata Components

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    37 of 45 8/16/2014 12:40 AM

  • Processors and the Message Passing Layer. The role of each component isdiscussed in the next sections to provide a better understanding of Teradata. Oncewe understand how Teradata works, we will pursue the SQL that allows storageand access of the data.

    The Parsing Engine Processor (PEP) or Parsing Engine (PE), for short, is one of thetwo primary types of processing tasks used by Teradata. It provides the entrypoint into the database for users on mainframe and networked computer systems.It is the primary director task within Teradata.

    As users logon to the database they establish a Teradata session. Each PE canmanage 120 concurrent user sessions. Within each of these sessions users submitSQL as a request for the database server to take an action on their behalf. The PEwill then parse the SQL statement to establish which database objects areinvolved. For now, lets assume that the database object is a table. A table is atwo-dimensional array that consists of rows and columns. A row represents anentity stored in a table and it is defined using columns. An example of a row mightbe the sale of an item and its columns include the UPC, a description and thequantity sold.

    Any action a user requests must also go through a security check to validate theirprivileges as defined by the database administrator. Once their authorization atthe object level is verified, the PE will verify that the columns requested actuallyexist within the objects referenced.

    Next, the PE optimizes the SQL to create an execution plan that is as efficient aspossible based on the amount of data in each table, the indices defined, the typeof indices, the selectivity level of the indices, and the number of processing stepsneeded to retrieve the data. The PE is responsible for passing the optimizedexecution plan to other components as the best way to gather the data.

    An execution plan might use the primary index column assigned to the table, asecondary index or a full table scan. The use of an index is preferable and will bediscussed later in this chapter. For now, it is sufficient to say that a full table scanmeans that all rows in the table must be read and compared to locate therequested data.

    Although a full table scan sounds really bad, within the architecture of Teradata, itis not necessarily a bad thing because the data is divided up and distributed tomultiple, parallel components throughout the database. We will look next at theAMPs that perform the parallel disk access using their file system logic. The AMPsmanage all data storage on disks. The PE has no disks.

    Activities of a PE:

    Parsing Engine Processor (PEP or PE)

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    38 of 45 8/16/2014 12:40 AM

  • Optimize the access path(s) to retrieve the rows

    Build an execution plan with necessary steps for row access

    Send the plan steps to Access Module Processors (AMP) involved

    The next major component of Teradatas parallel architecture is called an AccessModule Processor (AMP). It stores and retrieves the distributed data in parallel.Ideally, the data rows of each table are distributed evenly across all the AMPs. TheAMPs read and write data and are the workhorses of the database. Their job is toreceive the optimized plan steps, built by the PE after it completes theoptimization, and execute them. The AMPs are designed to work in parallel tocomplete the request in the shortest possible time.

    Optimally, every AMP should contain a subset of all the rows loaded into everytable. By dividing up the data, it automatically divides up the work of retrievingthe data. Remember, all work comes as a result of a users SQL request. If the SQLasks for a specific row, that row exists in its entirety (all columns) on a single AMPand other rows exist on the other AMPs.

    If the user request asks for all of the rows in a table, every AMP should participatealong with all the other AMPs to complete the retrieval of all rows. This type ofprocessing is called an all AMP operation and an all rows scan. However, each AMPis only responsible for its rows, not the rows that belong to a different AMP. As faras the AMPs are concerned, it owns all of the rows. Within Teradata, the AMPenvironment is a shared nothing configuration. The AMPs cannot access eachothers data rows, and there is no need for them to do so.

    Once the rows have been selected, the last step is to return them to the clientprogram that initiated the SQL request. Since the rows are scattered acrossmultiple AMPs, they must be consolidated before reaching the client. Thisconsolidation process is accomplished as a part of the transmission to the client sothat a final comprehensive sort of all the rows is never performed. Instead, allAMPs sort only their rows (at the same time in parallel) and the MessagePassing Layer is used to merge the rows as they are transmitted from all theAMPs.

    Therefore, when a client wishes to sequence the rows of an answer set, thistechnique causes the sort of all the rows to be done in parallel. Each AMP sortsonly its subset of the rows at the same time all the other AMPs sort their rows.Once all of the individual sorts are complete, the BYNET merges the sorted rows.Pretty brilliant!

    Activities of the AMP:

    Store and retrieve data rows using the file system

    Access Module Processor (AMP)

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    39 of 45 8/16/2014 12:40 AM

  • Sort and format output data

    The Message Passing Layer varies depending on the specific hardware on which

    the Teradata database is executing. In the latter part of the 20th

    century, mostTeradata database systems executed under the UNIX operating system. However,in 1998, Teradata was released on Microsofts NT operating system. Today it alsoexecutes under Windows 2000. The initial release of Teradata, on the Microsoftsystems, is for a single node.

    When using the UNIX operating system, Teradata supports up to 512 nodes. Thismassively parallel system establishes the basis for storing and retrieving datafrom the largest commercial databases in the world, Teradata. Today, the largestsystem in the world consists of 176 nodes. There is much room for growth as thedatabases begin to exceed 40 or 50 terabytes.

    For the NCR UNIX systems, the Message Passing Layer is called the BYNET. Theamazing thing about the BYNET is its capacity. Instead of a fixed bandwidth that isshared among multiple nodes, the bandwidth of the BYNET increases as thenumber of nodes increase. This feat is accomplished as a result of using virtualcircuits instead of using a single fixed cable or a twisted pair configuration.

    To understand the workings of the BYNET, think of a telephone switch used bylocal and long distance carriers. As more and more people place phone calls, noone needs to speak slower. As one switch becomes saturated, another switch isautomatically used. When your phone call is routed through a different switch,you do not need to speak slower. If a natural or other type of disaster occurs and aswitch is destroyed, all subsequent calls are routed through other switches. TheBYNET is designed to work like a telephone switching network.

    An additional aspect of the BYNET is that it is really two connection paths, likehaving two phone lines for a business. The redundancy allows for two differentaspects of its performance. The first aspect is speed. Each path of the BYNETprovides bandwidth of 10 Megabytes (MB) per second with Version 1 and 60 MBper second with Version 2. Therefore the aggregate speed of the two connectionsis 20MB/second or 120MB/second. However, as mentioned earlier, the bandwidthgrows linearly as more nodes are added. Using Version 1 any two nodescommunicate at 40MB/second (10MB/second * 2 BYNETs * 2 nodes). Therefore,10 nodes can utilize 200MB/second and 100 nodes have 2000MB/second availablebetween them. When using the version 2 BYNET, the same 100 nodescommunicate at 12,000MB/second (60MB/second * 2 BYNETs * 100 nodes).

    The second and equally important aspect of the BYNET uses the two connectionsfor availability. Regardless of the speed associated with each BYNET connection, ifone of the connections should fail, the second is completely independent and can

    Message Passing Layer (BYNET)

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    40 of 45 8/16/2014 12:40 AM

  • many normal networks that typically transfer messages at 10MB per second.

    All messages going across the BYNET offer guaranteed delivery. So, any messagesnot successfully delivered because of a failure on one connection automaticallyroute across the other connection. Since half of the BYNET is not working, thebandwidth reduces by half. However, when the failed connection is returned toservice, its topology is automatically configured back into service and it beginstransferring messages along with the other connection. Once this occurs, thecapacity returns to normal.

    Posted 21st March 2013 by pankaj agarwal

    0 Add a comment

    21st March 2013AMPs

    The Access Module Process (AMP) is the heart of the Teradata RDBMS.The Access Module Process is a virtual processor (vproc) that provides a BYNETinterface and performs many database and file management tasks.AMPs control the management of the Teradata RDBMS and also provide controlover the disk subsystem, with each AMP being assigned to a virtual disk.

    Each AMP controls the following set of functions:- BYNET (or Boardless BYNET) interface- Database manager- Locking

    Joins1.Sorting2.Aggregation3.Output data conversion4.Disk space management5.Accounting6.Journaling7.

    - File system and disk management

    Access Module Process AMP in TD

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    41 of 45 8/16/2014 12:40 AM

  • 1. Lock the table.2. Execute the operation requested.3. End the transaction.

    Posted 21st March 2013 by pankaj agarwal

    0 Add a comment

    20th March 2013

    Architecture of Teradata RDBMS

    Teradata is designed using Shared-Nothing architecture. Each processing unit processes its own unit

    of data in parallel. Teradata systems can be either SMP (Symmetric Multi Processing) or MPP (Massively

    Parallel Processing). In simple words a SMP system is a single node system where as a MPP system has

    two or more nodes working in parallel.

    Teradata architecture contains following components :

    1) Node

    2) VPROC

    3) PE

    4) AMP

    5) BYNET

    Architecture Components

    Node

    The basic building block for a Teradata system, the node is where the processing occurs for the database.

    A node is simply collection of many hardware and software components.

    PDE

    The PDE (Parallel Database Extensions) software layer runs the operating system on each node. It was

    created by NCR to support the parallel environment.

    Teradata RDBMS Components

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    42 of 45 8/16/2014 12:40 AM

  • Operating system software Teradata software Application software System dump space

    Teradata database tables are stored on disk arrays, not on the system disks.

    Memory

    Vprocs share a free memory pool within a node. A segment of memory is allocated to a vproc for its use,

    then returned to the memory pool for use by another vproc. The free memory pool is a collection of

    memory available to the node.

    Vproc

    A virtual processor or a vproc is a group of one or more software processes running under the operating

    system's multi-tasking environment:

    On the UNIX operating system, a vproc is a collection of software processes. On the Windows operating systems, a vproc is a single software process.

    The two types of Teradata vprocs are:

    AMP (Access Module Processor) PE (Parsing Engine)

    When vprocs communicate, they use BYNET hardware (on MPP systems), BYNET software, and PDE.

    The BYNET hardware and software carry vproc messages to and from a particular node. Within a node,

    the BYNET and PDE software deliver messages to and from the participating vprocs.

    PE

    PEs (Parsing Engines) are vprocs that receive SQL requests from the client and break the requests into

    steps. The PEs send the steps to the AMPs and subsequently return the answer to the client.

    AMP

    AMPs (Access Module Processors) are virtual processors (vprocs) that receive steps from PEs (Parsing

    Engines) and perform database functions to retrieve or update data. Each AMP is associated with one

    virtual disk (vdisk), where the data is stored. An AMP manages only its own vdisk, not the vdisk of any

    other AMP.

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    43 of 45 8/16/2014 12:40 AM

  • The vdisk is made up of 1 to 64 pdisks (user slices in UNIX or partitions in Windows NT, whose size and

    configuration vary based on RAID level). The pdisks logically combine to comprise the AMP's vdisk.

    Although an AMP can manage up to 64 pdisks, it controls only one vdisk. An AMP manages only its own

    vdisk, not the vdisk of any other AMP.

    BYNET

    The BYNET (banyan network) is a combination of hardware and software that provides high performance

    networking between the nodes of a Teradata system. A dual-redundant, bi-directional, multi-staged

    network, the BYNET enables the nodes to communicate in a high speed, loosely-coupled fashion. It is

    based on banyan topology, a mathematically defined structure that has branches reminiscent of a banyan

    tree.

    The BYNET is a high-speed interconnect (network) that enables multiple nodes in the system to

    communicate.

    The BYNET hardware and software handle the communication between the vprocs.

    Hardware: The nodes of an MPP system are connected with the BYNET hardware, consisting ofBYNET boards and cables.

    Software: The BYNET software is installed on every node. This BYNET driver is an interface betweenthe PDE software and the BYNET hardware.

    SMP systems do not contain BYNET hardware. The PDE and BYNET software emulates BYNET activity in

    a single-node environment. The SMP implementation is sometimes called "boardless BYNET."

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    44 of 45 8/16/2014 12:40 AM

  • [http://4.bp.blogspot.com/-14j5RcK_QIo/UUnlAZjUL6I/AAAAAAAADsQ/hl3mDm8W5uU/s1600/TD_Architcture.png]

    Posted 20th March 2013 by pankaj agarwal

    0 Add a comment

    Dynamic Views template. Powered by Blogger.

    Classic

    TERADATA http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...

    45 of 45 8/16/2014 12:40 AM